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FOREWORD 


In  1980  the  Assistant  Secretary  of  Defense  directed  all 
services  to  pursue  a  long-range  systematic  program  to  validate 
the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  and  to 
reevaluate  enlistment  standards  against  on-the-job  performance. 

As  a  result,  the  Army  has  been  investigating  the  validity  of  the 
ASVAB,  as  well  as  several  new  predictor  measures,  for  a  sample  of 
20  diverse  military  occupational  specialties  (MOS) .  This  effort, 
known  as  Project  A,  has  been  very  successful  in  validating  the 
ASVAB,  as  well  as  providing  the  Army  with  a  greater  understanding 
of  knowledge,  skills,  abilities,  and  other  personal  characteris¬ 
tics  (KSAOs)  required  for  these  20  MOS. 

A  major  question  now  facing  the  Army  is  how  to  extend  the 
wealth  of  data  collected  for  Project  A  to  the  other  250-plus 
entry-level  Army  MOS  and  to  new  MOS  created  for  new  hardware 
systems  as  they  become  operational.  A  second  challenge  is  to 
determine  the  methods  needed  for  setting  job  performance  stan¬ 
dards  that  can  be  used  in  making  selection  and  classification 
decisions . 

The  Army's  Synthetic  Validity  Project  ( SYNVAL)  addresses 
these  challenges.  Specifically,  the  objectives  of  SYNVAL  have 
been  to  (a)  evaluate  synthetic  validation  techniques  for  deter¬ 
mining  MOS-specific  selection  composites  for  each  MOS;  and  (b) 
evaluate  alternative  methods  for  setting  minimum  qualifying 
scores  on  each  of  these  composites.  The  research  proceeded  in 
three  iterative  phases.  The  third  and  final  phase  was  recently 
completed.  This  document  provides  information  on  Phase  III  re¬ 
search  plans,  objectives,  and  results. 

Based  on  the  results  of  the  evaluations,  recommendations 
have  been  made  for  the  most  promising  approaches  for  (a)  methods 
for  developing  job  performance  prediction  equations  for  all  of 
the  Army's  250-plus  entry-level  MOS;  and  (b)  methods  for  setting 
performance  standards  for  these  MOS.  The  technical  quality  of 
this  project  was  guided  by  the  Scientific  Advisory  Committee: 

Phil  Bobko  (Chair),  Robert  Linn,  Richard  Jaeger,  Joyce  Shields, 
and  Robert  Guion. 


EDGAR  M.  JOHNSON 
Technical  Director 
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ARMY  SYNTHETIC  VALIDITY  PROJECT: 
REPORT  OF  PHASE  III  RESULTS 
Volume  1 


EXECUTIVE  SUMMARY 


Requirement: 

For  new  military  occupational  specialties  (MOS)  and  for  many 
existing  MOS,  empirical  research  to  identify  and  validate  an  op¬ 
timal  composite  of  selection  measures  for  a  particular  Army  en¬ 
listed  MOS  cannot  always  be  carried  out.  The  selection  problem 
is  compounded  when  estimates  are  needed  for  the  validity  of  the 
composite  for  predicting  job  performance,  and  when  minimum  qual¬ 
ifying  scores  and  appropriate  cut  scores  for  other  critical  se¬ 
lection  decisions  are  needed. 


Procedure : 

The  Synthetic  Validity  Project  was  directed  at  overcoming 
this  problem  by  identifying  and  evaluating  alternative  procedures 
for  (a)  identifying  an  optimal  composite  of  selection  measures 
for  any  Army  MOS  and  estimating  the  validity  of  this  composite 
for  predicting  job  performance;  and  (b)  setting  a  minimum  qual¬ 
ifying  score,  or  standard  to  assure  a  reasonable  probability  of 
successful  job  performance,  as  well  as  cutting  scores  for  other 
critical  selection  decisions  (e.g.,  for  selecting  recruits  with 
potential  for  outstanding  performance) . 

There  are  three  research  phases  in  the  Project.  In  each 
phase,  synthetic  validation  procedures  and  standard  setting 
procedures  were  developed  or  refined  and  then  tried  out  on  a  new 
sample  of  MOS. 

For  the  Phase  III  synthetic  validation  portion  of  the  re¬ 
search,  a  major  goal  was  to  replicate  and  extend  procedures  for 
generating  synthetic  prediction  equation's  for  18  MOS.  The  Army 
Task  Questionnaire  was  used  to  obtain  job  description  judgments. 
Predictors  were  linked  via  expert  judgment  to  the  job  components. 
Various  ways  of  generating  prediction  equations  were 
investigated. 

For  the  Phase  III  standard  setting  research,  the  task-based 
and  critical  incident  methods  for  setting  standards  were  refined. 
These  procedures  were  further  developed  to  better  identify  job 
performance  standards  for  each  job  and  to  link  these  standards  to 
scores  on  the  predictor  composite  for  that  job. 
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Finally,  computer  software  was  developed  to  demonstrate  the 
linkage  between  test  scores  and  job  performance  acceptability 
levels. 


Findings : 

•  As  a  consequence  of  the  results  obtained  in  earlier 
phases  of  the  project,  the  attribute  model  and  the  job 
behavior  method  were  set  aside  and  the  Army  Task  Ques¬ 
tionnaire  became  the  tool  of  choice  for  use  in  synthetic 
validation.  While  all  methods  provided  reliable  de¬ 
scriptions,  the  task  questionnaire  yielded  greater 
discriminability  across  MOS  and  seemed  to  have  higher 
acceptability  among  the  judges. 

•  The  synthetic  validation  methods  produced  equations  that 
have  only  slightly  lower  absolute  validities  than  least 
squares  equations  developed  directly  on  the  jobs  them¬ 
selves,  depending  on  the  criterion  and  method  of  forming 
the  synthetic  equation. 

•  The  most  significant  conclusion  of  the  standard  setting 
research  was  that  the  different  methods  that  we  developed 
and  evaluated  led  to  different  results.  Very  strict 
standards  were  set  when  performance  was  described  in 
terms  of  "Percent  Go"  scores  on  hands-on  task  performance 
tests. 

•  We  developed  a  computer  program  to  demonstrate  the 
linkage  between  test  scores  and  acceptability  levels. 


Utilization  of  Findings: 

The  synthetic  validation  approach  provides  feasible  methods 
to  develop  a  prediction  equation  for  MOS  for  which  empirical  data 
are  not  available.  Based  on  the  research  described  here,  there 
are  several  good  options  available  but  no  clear-cut  choice  be¬ 
tween  them.  The  synthetic  method  and  validity  transportability 
methods  produced  absolute  validities  over  .60. 

Specific  procedures  for  scaling  standards  of  performance  is 
also  feasible.  Both  the  task-based  and  behavioral  incident 
standard  setting  instruments  provided  reliable  data.  When  devel¬ 
oping  standards,  job  experts  should  fully  understand  the  objec¬ 
tives  and  the  consequences  of  the  standard  setting  exercise.  It 
seems  likely  that  the  frame  of  reference  for  the  judgments  will 
influence  the  level  of  performance  designated  as  the  standard. 
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ARMY  SYNTHETIC  VALIDITY  PROJECT: 

REPORT  OF  PHASE  III  RESULTS 
Volume  I 

Chapter  1:  Introduction  and  Overall  Objectives 
John  P.  Campbell  (HumRRO)  and  Lauress  L.  Wise  (AIR) 


The  two  major  objectives  of  the  Army  Synthetic  Validity 
Project  are  to  identify  and  evaluate  procedures  for 

•  identifying  an  optimal  composite  of  selection  measures 
for  any  Army  enlisted  Military  Occupational  Specialty 
(MOS)  and  estimating  the  validity  of  this  composite  for 
predicting  job  performance,  and 

•  setting  a  minimum  qualifying  score  so  as  to  assure  a 
reasonable  probability  of  successful  job  performance,  as 
well  as  other  appropriate  cutting  scores  for  other 
critical  selection  decisions  (e.g.,  for  selecting 
recruits  with  potential  for  outstanding  performance). 

Synthetic  validation  approaches  typically  begin  with  the 
identification  of  a  set  of  job  components  that  can  be  used  to 
describe  the  population  of  jobs  being  studied.  A  prediction 
equation  is  derived  for  linking  available  selection  tests  to  each 
component.  Subject  matter  experts  (SMEs)  are  asked  to  identify 
the  importance  of  each  component  to  overall  job  performance. 
Finally,  the  prediction  equations  for  the  various  components  are 
weighted  according  to  the  importance  judgment  weights  and  summed 
to  obtain  an  equation  for  predicting  overall  performance  for  the 
job . 


The  standard  setting  task  of  the  Synthetic  Validity  Project 
is  charged  with  developing  procedures  for  specifying  minimum 
qualifying  scores  and  other  appropriate  cut  scores  on  the 
predictor  composites  identified  for  each  job.  Procedures  are 
being  developed  for  identifying  job  performance  standards  for 
each  job,  and  these  performance  standards  will  then  be  linked  to 
scores  on  the  predictor  composite  for  that  job. 

The  Army  Context 

The  critical  importance  of  the  Synthetic  Validity  Project's 
objectives  flow  directly  from  the  complexity  of  the  Army's 
personnel  management  tasks,  which  are  both  difficult  and  subject 
to  more  severe  constraints  than  virtually  any  other  large 
organization.  For  example,  during  the  past  10  years 
approximately  400,000  -  500,000  people  have  applied  each  year 
for  110,000  -  130,000  openings.  The  available  openings  are 
distributed  unevenly  across  approximately  275  different  jobs 
ranging  from  infantryman  to  helicopter  mechanic  to  paramedic  to 
administrative/clerical  specialist.  Each  new  accession  goes 
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immediately  to  basic  training  and  then  to  advanced  training  in 
his  or  her  chosen  specialty.  The  number  of  training  slots  that 
will  be  available  is  budgeted  at  least  one  year  in  advance,  and 
many  cost/benefit  parameters  are  optimized  if  every  seat  is 
filled  with  appropriate  people  on  the  day  the  class  starts.  The 
individual  MOS  assignment  is  a  function  of  training  seat 
availability  at  a  particular  time,  the  current  priority  for 
"filling"  the  MOS,  the  individual's  preference,  and  whether  or 
not  the  individual's  scores  on  the  Armed  Services  Vocational 
Aptitude  Battery  ( ASVAB )  meet  certain  cutoffs.  This  is  a  complex 
decision  process  which  must  take  place  very  quickly  and  is  made 
on  the  basis  of  a  relatively  small  amount  of  information. 

External  issues  about  which  the  Army  must  be  concerned  are  the 
fluctuating  labor  supply  with  its  current  downward  trend  and  the 
ups  and  downs  of  the  federal  budget  which  have  a  direct  effect  on 
resources  devoted  to  recruiting  and  the  resulting  nature  of  the 
applicant  pool.  At  the  same  time,  new  equipment  and  new  systems 
have  been  developed  and  the  technical  content  and  ability 
requirements  of  almost  all  MOS  have  increased  markedly.  A  more 
recent  constraint  is  the  reduction  in  the  number  of  new 
accessions  resulting  from  changes  in  the  global  political 

As  a  consequence  of  all  of  the  above,  optimal  selection  and 
classification  have  become  more  critical  than  ever.  Reduced 
resources  place  an  even  greater  premium  on  accurate  selection  and 
effective  classif ication .  At  the  same  time,  there  is  constant 
pressure  on  all  the  defense  services  to  provide  evidence  that 
their  personnel  decision-making  procedures  are  appropriate  and 
valid.  As  an  organization,  the  Army  is  a  very  large  and  very 
visible  employer. 


Projects  A  and  B 

The  Synthetic  Validity  Project  is  functionally  related  to 
two  other  research  and  development  projects  aimed  at  improving 
the  Army's  selection  and  classification  decision  making 
procedures:  Project  A  and  Project  B. 

Project  B 

Project  B  is  based  on  theory  and  method  in  econometrics  and 
operations  research.  It  has  developed  the  models  and  software 
for  an  enlisted  personnel  assignment  system  that  takes  into 
account: 


•  forecasts  of  the  future  applicant  supply 

•  forecasts  of  personnel  needs  in  each  MOS 

•  hiring  goals  for  different  subpopulations 

•  the  rate  at  which  training  class  slots  are  currently 
filling 
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•  the  MOS  priorities  designated  by  the  Army 

•  the  differential  utility  of  different  expected  levels  of 
performance  within  and  across  MOS 

•  the  level  of  selection  accuracy  and  differential 
prediction  across  MOS  provided  by  the  predictor  battery. 

Project  B  is  intended  to  produce  a  state-of-the-art 
algorithm  for  optimizing  personnel  decisions,  given  certain 
goals,  and  for  conducting  a  wide  variety  of  "what  if"  exercises 
regarding  changes  in  labor  supply,  priorities,  utilities,  and 
criterion  content. 

Project  A 

Project  A  is  a  very  large  personnel  selection  and 
classification  validation  project  that  was  intended  to  use  a 
sample  of  jobs  (MOS)  from  the  entire  population  of  enlisted  MOS 
to  validate  both  the  existing  test  battery  (ASVAB)  and  a  battery 
of  newly  developed  selection/classification  tests  against  a 
comprehensive  set  of  performance  measures.  The  major  research 
issues  revolved  around: 

•  how  to  define  and  measure  job  performance 

•  the  tradeoff  between  the  number  of  jobs  vs.  the  sample 
size  for  each  job,  given  that  resources  did  not  permit 
drawing  a  sample  from  each  of  the  27c5  MOS 

•  identification  of  predictor  domains  with  the  highest 
potential  for  adding  selection  validity  and 
classification  validity  to  the  existing  ASVAB 

•  how  specific  variables  should  be  targeted  to  represent 
each  critical  domain  for  predictor  development 

•  how  performance  measures  should  be  aggregated  into 
composites  for  validation  purposes 

•  how  the  utility  of  performance  across  MOS  can  be  scaled 

•  how  to  choose  optimal  predictor  batteries  for  different 
goals  (e.g.,  maximizing  performance  vs.  minimizing 
attrition) 

•  how  to  choose  predictor  batteries  and  estimate  validity 
for  jobs  (MOS)  for  which  no  empirical  data  could  be 
obtained . 

The  Project  A  data  base  is  critical  for  certain  steps  in  the 
Synthetic  Validity  Project  procedure,  and  it  is  briefly 
summarized  in  the  following  sections. 
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Design  and  method.  To  pursue  the  project's  objectives  while 
addressing  the  above  issues,  the  following  design  was  used  in 
Project  A.  There  were  two  major  validation  samples:  (a)  a 
concurrent  validation  (CV)  sample  was  taken  from  the  cohort  of 
1983/84  enlistees  into  19  MOS  and  measured  on  both  the  new 
predictors  and  new  criterion  measure.'  in  1985;  and  (b)  a 
longitudinal  validation  (LV)  sample  was  assessed  on  the 
predictors  when  they  entered  the  Army  in  1986/87  and  tested  on 
the  performance  measures  in  1988/89.  This  second  sample  included 
three  additional  MOS  with  one  CV  MOS  deleted  for  a  total  of  21 
MOS.  Each  sample  consisted  of  approximately  250  to  750  people 
with  the  MOS  selected  to  be  representative  of  the  entire 
population  of  enlisted  MOS.  Consequently,  from  each  sample 
predictor  and  criterion  measurement  data  is  available  for 
approximately  10,000  individuals. 

Criterion  measures  were  developed  by  conducting  an  extensive 
task  analysis  and  critical  incident  analysis  of  each  MOS.  All 
available  sources  and  multiple  expert  reviews  were  used  to 
generate  a  full  listing  of  all  tasks  in  each  MOS  as  well  as 
judgments  about  the  criticality  and  difficulty  of  each  task  and 
the  similarity  among  tasks.  For  a  representative  sample  of 
critical  tasks  in  each  MOS,  job  sample  (hands-on)  exercises, 
paper-and-pencil  knowledge  tests,  and  rating  scales  were 
developed.  The  critical  incident  analysis  produced  a  complete 
set  of  performance  dimensions  for  each  MOS,  and  behavioral  rating 
scales  were  developed  for  each  of  the  dimensions  that  survived 
the  critical  incident  retranslation  and  SME  reviews.  In 
addition,  rating  scales  were  developed  to  assess  expected 
performance  in  combat.  Finally,  existing  administrative  records 
were  examined  and  six  variables  retained  as  performance 
indicators  (e.g.,  number  of  awards  and  letters  of  commendation). 
The  full  performance  assessment  required  12  hours  per  individual. 

Potential  new  predictor  variables  were  selected  through  a 
painstaking  process  of  literature  search,  expert  review,  and 
evaluation  of  previous  research.  The  goal  was  to  produce  a  four- 
hour  battery  of  new  tests  that  would  maximize  the  chances  of 
improving  selection/classification  accuracy  for  the  entire  system 
(i.e.,  population  of  MOS).  In  the  end,  the  domains  from  which 
the  experimental  predictors  were  sampled  were  the  following  (in 
addition  to  the  ASVAB ) : 

•  spatial  ability 

•  perceptual  speed  and  accuracy 

•  psychomotor  abilities 

•  personality/temperament 

•  vocational  interests 

•  biographical  history 
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The  major  steps  in  the  analysis  were  directed  first  at 
developing  a  basic  set  of  predictor  scores  from  the  four-hour 
battery,  a  basic  set  of  performance  scores  from  the  12  hours  of 
criterion  assessment,  and  a  model  of  performance  that  would 
account  for  the  covariances  among  criterion  scores .  Then  the 
correlations  between  each  predictor  score  and  each  criterion 
score  for  each  MOS  were  calculated,  and  an  analysis  of 
differential  prediction  across  criterion  dimensions  within  MOS 
(e.g.,  do  different  measures  predict  different  dimensions  of 
performance  for  given  jobs)  and  across  MOS  for  each  major 
criterion  dimension  (e.g.,  do  different  measures  predict  the  same 
dimension  of  performance  for  different  jobs)  was  carried  out. 

Results .  After  analysis  of  the  CV  sample  data,  the  subtests 
of  the  ASVAB  plus  the  four-hour  battery  of  experimental  tests 
were  arrayed  into  24  predictor  scores .  They  are  listed  in  Figure 
1.1. 


On  the  basis  of  the  CV  sample,  the  multiple  performance 
measures  were  first  aggregated  into  28  to  31  basic  criterion 
scores  (depending  on  the  MOS)  by  means  of  expert  judgment  panels 
and  exploratory  factor  analyses.  A  confirmatory  analysis 
procedure  was  then  used  to  test  the  fit  of  these  basic  scores 
with  alternative  models  of  the  latent  criterion  structure.  The 
best  fitting  model  included  five  content  factors  and  two  method 
factors.  They  are  shown  as  Figure  1.2. 

The  concurrent  validation  analyses  generated  a  24 
(predictors)  by  5  (criteria)  matrix  of  validity  coefficients  for 
each  MOS .  These  matrices  were  examined  for  the  level  of  average 
validity,  for  profiles  of  validities  across  predictors  for  each 
criterion  factor,  for  patterns  of  validities  across  the  five 
factors  within  MOS,  and  validity  patterns  across  MOS  for  each  of 
the  five  criterion  factors.  The  following  conclusions  summarize 
the  results: 

•  Each  of  the  five  criterion  factors  can  be  predicted  with 
considerable  accuracy  but  not  by  the  same  predictors . 

•  There  is  considerable  differential  prediction  across 
criterion  factors  within  each  MOS.  This  suggests  that 
different  goals  could  be  emphasized  in 
selection/classification  (e.g.,  maximizing  technical 
performance  vs.  minimizing  discipline/motivational 
problems ) . 

•  The  only  criterion  factor  to  show  significant 
differential  prediction  across  MOS  was  the  core  technical 
performance  factor.  For  the  other  four  performance 
components,  the  same  predictor  profile  was  found  in  each 
MOS  . 
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General  Cognitive 

ASVAB  Subtests  Ability  Composites 


Figure  1.1.  Project  A  test  content  and  predictor  composite  scores. 


1-6 


Spatial  Battery  Tests 


Spatial  Ability  Composite 


Assembling  Objects 
Map 
Mazes 

- -  Spatial 

Object  Rotation  _ 

Orientation 
Figural  Reasoning 

Figure  1.1.  Project  A  test  content  and  predictor  composite  scores 
(continued). 
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Perce  ptual-Psychomotor 

Computer  Battery  Tests  Ability  Composites 


Figure  1.1.  Project  A  test  content  and  predictor  composite  scores 
(continued). 
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Temperament- 

ABLE  Scales  Personality  Composites 


Figure  1.1.  Project  A  test  content  and  predictor  composite  scores 
(continued). 
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AVOICE  Scales 


Vocational  Interests 


Figure  1.1.  Project  A  test  content  and  predictor  composite  scores 
(continued). 
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JOB  Scales 


Job  Reward 

Perference  Composites 


Figure  1.1.  Project  A  test  content  and  predictor  composite  scores 
(continued). 


1-11 


Latent  Performance  Constructs 


Content  Constructs 


Criterion  Measures 


AWB  Effort 
AWB  Discipline 
AWB  Fitness 
AKB  Overall 
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Technical 

Proficienc 


General 
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Leadershi 


_ Method  Constructs 

Physical 

Fitness/  Written 

Personal  Military  Knowledge  Rating 
Discipline  Bearing  Tests  Scales 


M16 

lification 


MOS  Technical  X  X 

MOS  Other  X  X 


Cmbt  Perform  Well  X  X 

Cmbt  Avoid  Mistakes  X  X 


Note.  AWB  *  Army  Wide  BARS;  HO  -  Hands-on;  JK  -  Job  Knowledge;  SK  -  School  Knowledge. 


Figure  1.2.  Mapping  of  performance  measures  into  performance 
constructs . 
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The  Basic  Issue 


Using  Project  A  results,  optimal  prediction  equations  can  be 
developed  for  21  MOS,  and  classification  efficiency  can  be 
examined  across  the  same  21.  However,  the  Army  must  select  and 
assign  people  to  approximately  275  MOS.  When  implemented,  the 
Project  B  algorithm,  or  one  like  it,  must  lead  to  a  decision  for 
all  applicants .  Because  individual  allocation  decisions  must  be 
made  in  real  time,  it  is  not  possible  to  optimize  personnel 
assignments  for  a  particular  period  via  batch  processing.  To 
make  decisions  in  real  time,  the  system  needs  meaningful 
"standards"  for  each  MOS  that  specify  the  optimal  constraints. 

Obtaining  Prediction  Equations  for  All  MOS 

There  are  three  major  ways  to  approach  this  issue: 

•  Empirical  validation  could  be  carried  out  for  all  275 
MOS 

•  Because  the  21  MOS  were  selected  to  be  representative  of 
clusters  of  MOS  judged  to  be  similar  in  content  within 
each  cluster,  validity  generalizations  could  be  assumed 
within  each  cluster  and  examined  empirically  across  the 
21.  That  is,  the  significant  differential  prediction 
across  MOS  for  the  Core  Technical  Proficiency  (CTP) 
factor  may  be  accounted  for  by  fewer  than  21  equations . 

•  A  synthetic  validation  procedure  could  be  used  to  select 
a  predictor  battery  for  each  MOS.  The  21  MOS  in  the 
Project  A  sample  provide  a  means  for  empirically 
validating  any  such  synthetic  procedures. 

The  latter  strategy  is  the  focus  of  the  Synthetic  Validity 
Project.  If  a  successful  synthetic  validation  procedure  could  be 
developed,  it  would  provide  a  less  costly  way  (and  perhaps  the 
only  feasible  way)  of  developing  selection/classification 
procedures  for  new  MOS,  for  MOS  that  have  undergone  significant 
changes,  or  for  MOS  that  have  relatively  few  people  in  them. 

Setting  standards .  An  analogous  set  of  possible  procedures 
could  be  used  to  develop  performance  standards  and  concomitant 
selection  standards  for  individual  MOS.  Performance  scaling  and 
empirical  selection  standards  could  be  developed  for  each  job, 
the  standards  for  the  focal  job  in  a  cluster  could  be  generalized 
to  all  other  MOS  in  the  cluster,  or  a  synthetic  procedure  could 
be  developed  for  inferring  standards  on  jobs  for  which  empirical 
data  are  not  available. 

The  remaining  chapters  in  this  report  describe  the  major 
parts  in  this  synthetic  validation  and  standard  setting  effort, 
a  brief  overview  of  which  is  given  below. 
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A  General  Overview  of 
the  Army  Synthetic  Validity  Project 

The  "synthetic  validity"  approach  was  first  introduced  by 
Lawshe  (1952)  as  an  alternative  to  the  situational  validity 
approach,  which  requires  separate  validity  analyses  for  each  job 
in  the  organization.  Balma  (1959)  defined  synthetic  validity  as 
"discovering  validity  in  a  specific  situation  by  analyzing  jobs 
into  their  components,  and  combining  these  validities  into  a 
whole . " 

Guion  (1976)  provides  a  review  of  several  approaches  to 
conducting  synthetic  validation.  The  approach  most  relevant  to 
the  problem  at  hand  involves: 

•  identifying  job  components  that  are  common  across  a  range 
of  jobs 

•  using  criterion-related  validity  information  or  expert 
judgment  to  estimate  the  validity  of  potential  predictors 
of  each  component  of  job  performance 

•  developing  predictor  composites  for  each  job  by  combining 
the  prediction  equations  for  each  of  the  job  components 
that  are  relevant  to  the  job. 

The  usefulness  of  this  variant  of  synthetic  validation 
depends  on  three  critical  operations . 

First,  a  set  of  components  must  be  identified  that  cover  all 
important  aspects  of  performance  in  all  enlisted  jobs.  The 
taxonomy  of  job  components  must  be  reasonably  exhaustive  of  the 
job  population  such  that  the  critical  parts  of  any  particular  job 
can  be  described  completely  by  some  subset  of  the  complete 
taxonomy  of  all  relevant  components.  In  addition,  there  must  be 
a  group  of  subject  matter  experts  (SMEs)  available  who  understand 
these  components  well  enough  to  provide  reliable  and  accurate 
importance  or  relevance  weights  for  the  components  in  a 
particular  job. 

Second,  it  must  be  possible  to  establish  equations  for 
predicting  performance  on  each  component  from  current  or 
potential  selection  measures.  The  prediction  equation  for  a 
given  component  must  be  independent  of  the  particular  job  for 
which  the  component  is  judged  relevant.  Either  empirical  or  a 
combination  of  empirical  and  judgment-based  procedures  must  be 
used  to  establish  the  predictive  relationships  for  each 
component.  There  also  must  be  reliable  differences  between  the 
prediction  equations  for  different  components.  To  the  extent 
that  the  same  measures  predict  all  components  of  performance,  the 
overall  prediction  equations  will  necessarily  be  the  same  across 
jobs.  In  such  a  case  a  validity  generalization  model  would 
apply,  and  there  would  be  no  basis  for  differential 
classification . 


1-14 


Third,  synthetic  validation  models  assume  that  overall  job 
performance  can  be  expressed  as  the  weighted  or  unweighted  sum  of 
individual  performance  on  the  critical  components .  Composite 
prediction  equations  are  typically  expressed  as  the  corresponding 
sum  of  the  individual  component  prediction  equations .  To 
estimate  the  validity  of  the  composite  prediction  equation, 
validity  estimates  for  the  predictors  of  each  component  are 
needed  and  some  further  assumptions  are  required.  Most 
typically,  it  is  assumed  that  errors  in  estimating  different 
components  of  performance  are  uncorrelated. 

while  the  bulk  of  the  literature  on  synthetic  validation 
comes  from  studies  done  in  industry  (Mossholder  &  Arvey,  1984), 
the  extant  literature  on  standard  setting  has  been  generated 
largely  within  an  educational  context.  The  concern  has  been 
either  with  criterion  referenced  testing  of  student  achievement 
or  with  certification  standards  for  teachers.  Within  this 
context  there  are  three  principal  issues:  (a)  the  relevance  and 
completeness  of  the  content  sampled  for  measurement,  (b)  the 
validity  of  the  response  capability  (e.g.,  declarative  knowledge 
vs.  procedural  skill)  incorporated  into  the  measurement  method, 
and  (c)  the  method  used  to  provide  criterion  referenced  scale 
values  for  selected  points  on  the  performance  continuum.  This 
literature  has  been  reviewed  by  Hambleton  (1980)  and  more 
recently  as  part  of  this  project  by  Pulakos,  Wise,  Arabian,  Heon, 
and  Delaplane  (1989). 

For  the  Army,  a  critical  issue  is  the  identification  of  the 
appropriate  individuals  (i.e.,  expert  judges)  to  impose 
performance  standards  on  the  existing  performance  distribution  or 
to  scale  performance  scores  in  terms  of  defined  standards .  The 
literature  indicates  that  not  all  standard  setting  procedures 
produce  the  same  results.  The  purpose  of  the  current  project  is 
to  identify  the  method(s)  which  maximize  reliability,  relevance, 
and  acceptability  in  the  context  of  setting  standards  for  first 
tour  performance  and  selection  standards  for  entry  into  specific 
MOS. 


Project  Design 

The  general  design  of  the  Synthetic  Validity  Project  has 
been  as  follows.  After  a  thorough  literature  search,  we  outlined 
a  set  of  alternative  methods  for  describing  job  components. 

These  were  based  on  our  own  and  previous  work  in  constructing 
taxonomies  of  human  performance  (e.g.,  Fleishman  &  Quaintance, 
1984).  Four  principal  kinds  of  components  or  descriptive  units 
for  analyzing  jobs  were  initially  proposed:  behavior  description 
approaches  (e.g.,  handling  objects),  behavior  requirements 
approaches  (e.g.,  decision  making),  ability  requirements 
approaches  (e.g.,  finger  dexterity),  and  task  characteristics 
approaches  (e.g.,  fires  main  gun). 
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After  an  initial  review  of  alternative  types  of  components, 
we  decided  to  combine  behavior  requirements  and  ability 
requirements  and  to  proceed  with  three  approaches .  The  first  was 
a  Job  Behaviors  Model.  The  components  were  defined  as  general 
job  behaviors  that  are  not  task  specific,  but  which  can  underlie 
several  job  tasks.  Examples  might  be  "recalling  verbal 
information"  or  "driving  heavy  equipment."  For  this  approach  we 
attempted  to  identify  a  set  of  performance  behaviors  that  can  be 
meaningfully  linked  to  predictor  measures.  Some  concerns  were 
that  it  may  be  difficult  to  develop  the  taxonomy  of  behavior  in 
sufficient  detail  to  be  useful,  that  the  judgments  of  job 
relevance  may  be  difficult,  and  that  general  "behaviors"  as 
descriptors  may  not  be  accepted  by  those  making  the  judgments . 

The  descriptive  units  in  the  second  approach  were  Job  Tasks . 
An  initial  list  of  performance  tasks  was  developed  in  Project  A 
from  duty  area  descriptions  for  the  111  enlisted  jobs  with  the 
largest  number  of  incumbents .  These  descriptions  provided  a 
basis  for  defining  job  components  that  are  clusters  of  tasks 
rather  than  behaviors  within  tasks .  The  chief  advantages  of  this 
model  were  a  close  match  to  previous  empirical  validity  data  and 
the  familiarity  of  SMEs  with  these  kinds  of  descriptions.  The 
primary  concerns  were  that  the  taxonomy  may  not  be  complete 
enough  to  handle  new  jobs  and  that  the  relationships  of  job 
component  performance  to  individual  predictors  may  be  difficult 
to  determine  reliably  and  accurately. 

The  third  approach  was  called  the  Individual  Attribute 
Model.  In  this  approach,  the  components  were  job  requirements 
described  in  terms  of  mental  and  physical  abilities,  interests, 
traits,  and  other  individual  difference  dimensions.  This  model 
eliminated  the  need  to  establish  links  between  predictors  and  job 
components  because  the  attributes;  (job  requirements)  are  the 
predictors.  The  chief  concerns  with  this  approach  was  that  there 
may  be  no  SMEs  who  know  enough  about  both  the  job  and  the  human 
attribute  dimensions  to  describe  job  requirements  accurately  and 
that  this  approach  may  not  be  as  acceptable  as  a  method  based  on 
more  specific  job  descriptors. 

Again,  one  major  objective  of  the  project  has  been  to 
evaluate  the  alternative  models  in  terms  of  how  well  they  help 
meet  the  Army's  needs  for  selection  and  .classification  decision 
procedures  for  each  MOS . 

The  investigation  of  standard  setting  also  used  three 
general  approaches.  First,  judges  were  asked  to  give  a  direct 
estimate  of  the  percentage  of  individuals  in  the  current  force 
who  meet  certain  specified  standards  (e.g.,  marginal  -  marginal 
individuals  need  additional  training  and  skill  development  or 
they  should  not  stay  in  the  Army).  Second,  a  sample  of  critical 
incidents  of  performance  were  judged/scaled  in  terms  of  the 
absolute  level  of  performance  that  each  represented.  Third,  task 
performance,  portrayed  as  the  results  obtained  from  administering 
standardized  task  tests  (such  as  used  in  Project  A),  was  also 
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scaled  by  judges  in  terms  of  the  standard  of  performance 
represented  by  various  score  levels. 

Given  these  three  different  approaches  to  standard  setting, 
the  Synthetic  Validity  Project  has  systematically  investigated 
the  reliability  of  such  judgments,  the  agreement  across  methods 
and  variations  within  methods,  the  reactions  of  the  judges  to 
each  method  and  the  comparative  results  of  general  versus 
specific  standards.  The  standard  setting  investigation  has  also 
included  a  comparison  of  various  combinatorial  rules  for 
aggregating  component  standards  and  methods  for  inferring 
selection  standards  from  performance  standards. 

Procedure 


The  Synthetic  Validity  Project  has  followed  an  iterative 
procedure.  This  iterative  approach  provides  an  opportunity  for 
revisions  of  the  models  and  research  methods  followed  by 
evaluation  of  a  more  refined  version  of  each  approach.  The 
design  specified  first  a  series  of  exploratory  workshops  to 
assess  the  completeness  and  clarity  of  each  of  the  alternative 
procedures  followed  by  three  phases  of  further  development  and 
evaluation.  In  Phase  I,  initial  procedures  were  tested  for  three 
of  the  Project  A  MOS.  In  Phase  II,  revised  procedures  were 
tested  for  seven  more  Project  A  MOS.  The  final  revisions  of  the 
procedures  were  tested  in  Phase  III  using  10  Project  A  MOS  and 
one  MOS  not  sampled  by  Project  A. 

Throughout  the  project  design,  the  emphasis  has  been  on  the 
identification  and  evaluation  of  alternative  approaches  to  the 
implementation  of  synthetic  validation  and  standard  setting 
procedures .  We  evaluate  the  extent  to  which  each  model  can  meet 
the  requirements  for  effective  synthetic  validation  and  standard 
setting.  In  the  course  of  doing  that,  we  compare  the  results 
produced  by  different  types  of  judges  who  evaluated  the  relevance 
of  the  different  types  of  components  and  scaled  the  acceptability 
of  different  performance  levels  for  the  target  jobs.  The 
judgments  produced  by  the  type-of- judge/type-of -component 
combinations  are  compared  in  terms  of  their  distributional 
properties,  inter judge  reliabilities,  discriminability,  and 
acceptability. 

Job  Description  Objectives  and  Findings 

Phase  I.  The  primary  goal  in  Phase  I  for  synthetic 
validation  was  to  obtain  and  evaluate  synthetic  prediction 
equations  for  three  MOS:  11B  (Infantryman),  63B  (Light-Wheel 
Vehicle  Mechanic),  and  71L  (Administrative  Specialist).  Three 
steps  were  necessary  to  accomplish  this  goal.  First,  three  job 
component  models  (consisting  of  tasks,  activities,  or  attributes) 
were  developed  and  used  to  obtain  job  description  judgments. 
Predictors  were  then  linked  via  expert  judgment  to  the  job 
components,  and  various  ways  of  generating  prediction  equations 
were  investigated.  A  second  goal  was  to  evaluate  differences  in 
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the  job  descriptions  generated  by  different  types  of  judges,  that 
is,  NCOS1  versus  Officers  at  FORSCOM2  versus  TRADOC3 
installations. 

For  synthetic  validation,  the  completion  of  Phase  I 
represented  a  major  accomplishment  for  the  project.  First,  it 
was  demonstrated  that  synthetic  validation  can  be  successfully 
carried  out  for  the  three  Phase  I  MOS  investigated.  Army  SMEs 
were  able  to  use  the  three  job  component  models  to  reliably 
describe  the  content  of  these  jobs.  Table  1.1  shows,  for  the 
Task  Category  and  Job  Activity  instruments,  adequate  single¬ 
rater  reliability  estimates  of  importance  ratings  for  Core 
Technical  Proficiency  and  Overall  Job  Performance.  Table  1.1 
also  shows  adequate  reliability  estimates  for  attribute  validity 
ratings  from  soldiers  and  psychologists  for  Core  Technical 
Proficiency.  Using  job  description  and  job  component  validity 
information,  prediction  equations  were  formed  that  were  valid  for 
predicting  core  Technical  Proficiency  for  each  of  the  three  jobs 
(see  Table  1.2).  However,  as  Table  1.2  also  shows,  the 
prediction  equations,  on  average,  offered  little  or  no 
discriminant  validity4.  That  is,  synthetic  equations  were 
similar  such  that  the  validity  of  an  equation  derived  on  one  MOS 
and  applied  to  a  second  MOS  differed  little,  if  any,  from  the 
validity  of  the  equation  derived  on  the  second  MOS.  Both 
absolute  and  discriminant  validities  are  lower  than  empirical 
validities. 

Phase  II.  A  principal  objective  of  Phase  II  was  to 
replicate  the  results  from  Phase  I  on  a  larger  set  of  jobs:  16S 
(MANPADS  Crewman) ,  19E/K  (Armor  Crewman) ,  67N  (Utility  Helicopter 
Repairer) ,  76Y  (Unit  Supply  Specialist) ,  88M  (Motor  Transport 
Operator) ,  91A/B  (Medical  Specialist) ,  94B  (Food  Service 
Specialist) .  A  fourth  job  descriptor  model  was  developed  which 
combined  the  task  and  activity  models  and  was  appropriately 
labeled  the  "hybrid"  model.  Further,  the  methodology  for  using 
the  attribute  model  was  expanded  to  include  a  rank  ordering  of 
the  attributes  in  addition  to  attribute  validity  estimates.  We 
compared  the  four  job  descriptor  methods  on  a  number  of 
distributional  and  psychometric  properties  that  could  serve  as 
indicators  of  their  comparative  value  for  synthetic  validation. 
Three  major  parameters  characterize  the  alternative  methods: 

(a)  type  of  descriptor  (task,  activity,  "hybrid,  or  attribute) ; 

(b)  type  of  response  scale  (frequency,  importance,  difficulty, 
and  validity  estimates) ;  and  (c)  type  of  expert  judge  (NCOs  and 
Officers  at  FORSCOM  and  TRADOC  installations) .  By  comparing  the 


^Non-commissioned  officers. 

^Refers  to  operational  units  (Forces  Command). 

^Refers  to  training  and  doctrine  units  (Training  and  Doctrine  Command). 

L 

Mean  diagonal  minus  mean  of f -diagonal" 
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Table  1.1 


Reliability  Estimates  of  Phase  I  Job  Description  Ratings  and 
Validity  Ratings 


11B 

MOS 

63B 

71L 

Task  Category  Importance  for 

Core  Technical  Proficiency 

.52 

.36 

.40 

Overall  Job  Performance 

.52 

.43 

.44 

Job  Activity  Importance  for 

Core  Technical  Proficiency 

.36 

.23 

.43 

Overall  Job  Performance 

.36 

.25 

.34 

Attribute 

Validity  (Soldiers) 

.31 

.34 

.45 

Validity  (Psychologists) 

.42 

.55 

.52 

Table  1 . 2 

Comparing  Synthetic  and  Empirical  Composites 

Obtained  in  Phase  I 

Composites 

Absolute 

Validity 

Mean 

Discriminant 

Validity 

Empirical  Composites 

.67 

.17 

Synthetic  Composites 

Task  Category 

.55 

.01 

Job  Activity 

.53 

.01 

Attribute  (Soldiers) 

.52 

.02 

Attribute  (Psychologists) 

.58 

.04 

Note.  Mean  absolute  validity  tfas  calculated  by  averaging  across 
the  three  Phase  I  MOS . 
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job  descriptor  methods,  we  hoped  to  identify  a  single  job 
component  model:  one  that  was  reliable  and  yielded  the  optimal 
differential  prediction  among  jobs. 

As  shown  in  Table  1.3,  acceptable  single-rater  reliability 
estimates  of  importance  ratings  for  Core  Technical  Proficiency 
and  Overall  Job  Performance  were  obtained  via  the  Task  Category, 
Job  Activity,  and  Hybrid  instruments.  Table  1.3  also  shows 
sufficient  reliability  estimates  for  attribute  validity  ratings 
and  rankings  for  Core  Technical  Proficiency. 


Table  1 . 3 

Reliability  Estimates  of  Phase  II  Job  Description  Ratings  and 
Validity  Ratings 


Task  Category  Importance  for 
Core  Technical  Proficiency 
Overall  Job  Performance 

Job  Activity  Importance  for 
Core  Technical  Proficiency 
Overall  Job  Performance 

Hybrid  Importance  for 

Core  Technical  Proficiency 
Overall  Job  Performance 

Attribute  Ratings 

Core  Technical  Proficiency 

Attribute  Rankings 

Core  Technical  Proficiency 


MOS 


16S 

19K 

67N 

76Y 

88M 

91A 

94B 

.46 

.55 

.54 

.73 

.50 

.43 

.43 

.56 

.55 

.56 

.47 

.48 

.48 

.44 

.44 

.37 

.38 

.24 

.39 

.26 

.27 

.47 

.35 

.34 

.22 

.36 

.27 

.25 

.41 

.43 

.38 

.35 

.44 

.34 

.33 

.46 

.42 

.39 

.37 

.42 

.36 

.34 

.21 

.22 

.30 

.22 

.21 

.16 

.15 

.38 

.40 

.53 

.48 

.38 

.41 

.37 

In  identifying  a  prototypical  job  descriptor  model,  we 
placed  primary  emphasis  on  the  model's  ability  to  produce 
predictor  equations  that  provide  acceptable  validity  for  each  job 
and  adequate  differential  prediction  among  jobs.  Four  methods 
for  forming  criticality  weights  were  explored  which  involved 
various  combinations  of  frequency  and  importance  ratings .  Three 
variations  of  the  criticality  weights  were  investigated.  These 
variations,  labeled  "threshold"  methods,  assigned  non-zero 
criticality  weights  to  components  with  mean  frequency  or  core 
technical  importance  ratings  that  were  above  a  specified  cutoff 
(i.e.,  threshold).  Table  1.4  shows  Phase  II  absolute  and 
discriminant  validities  for  the  different  questionnaires  and  the 
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Table  1 . 4 


Comparing  Synthetic  and  Empirical  Composites  Obtained  in  Phase  II 


Mean 

Absolute  Validity 

Mean 

Discriminant  Validity 

Composites 

V* 

Rb 

Uc 

V* 

Rb 

Uc 

Empirical  Composites 

.69 

.08 

Synthetic  Composites 

Task  Category 

.57 

.33 

.61 

.01 

.02 

.02 

Job  Activity 

.52 

.30 

.53 

.01 

.01 

.01 

Hybrid 

.53 

.32 

.55 

.01 

.02 

.02 

Attribute  Ratings 

.51 

.31 

.52 

.01 

.03 

.01 

Note.  Mean  validities 

were 

calculated 

by  averaging 

across 

the 

seven  Phase  II  MOS,  across  the  threshold  models,  and  across  the 
criticality  variations. 

*V  =  Validity  estimates  as  predictor  weights.  bR  =  Regression 
derived  predictor  weights .  CU  =  Unit  weights  for  predictors . 


different  methods  of  deriving  predictor  weight.  Except  for  the 
regression  method  validities,  results  are  similar  to  Phase  I. 

The  Task  Category  model  emerged  as  the  prototypical  job 
descriptor  instrument  primarily  because  it  had  higher  absolute 
and  discriminant  validities  than  the  other  models,  but  also 
because  it  had  adequate  reliability  levels  and  was  acceptable  to 
Army  SMEs . 

Standard  Setting  Objectives  and  Findings 

Phase  I .  A  major  goal  in  Phase  I  standard  setting  was  to 
investigate  different  procedures  for  setting  performance 
standards.  Performance  level  definitions  were  developed.  Three 
standard  setting  methods  were  developed  to  obtain  component 
standards.  The  first  two  methods  reflect  performance  on  tasks 
(Task-Based)  and  behavioral  examples  (Critical  Incident-Based). 
The  third  method  involves  asking  SMEs  to  directly  estimate  the 
percentage  of  soldiers  who  are  currently  performing  at  various 
levels  of  performance  (Soldier-Based).  A  method  was  also 
developed  for  combining  the  component  standards . 

Army  SMEs  found  the  performance  level  definitions  to  be 
reasonable  and  workable.  Many  SMEs  also  reportei  that  the 
outcomes  of  the  performance  levels  were  realistic.  As  Table  1.5 
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shows,  the  three  methods  for  setting  standards  resulted  in 
different  standards.  These  methods  also  resulted  in  some 
differences  in  the  degree  of  consensus  among  judges  in  setting 
the  standards .  Compared  to  the  Critical  Incident  and  Soldier- 
Based  methods,  the  Task-Based  method  resulted  in  the  strictest 
standards ,  which  meant  that  it  reported  the  highest  proportion  of 
unacceptable  performance  among  incumbents.  We  also  found  that 
SMEs  reported  difficulties  in  providing  task-based  standards.  In 
deriving  an  overall  standard  from  component  standards,  there  was 
evidence  that  a  linear  compensatory  model  accurately  captures  the 
judges'  aggregation  strategies. 

Phase  II.  Because  meaningful  standards  were  obtained  for 
the  three  jobs  in  Phase  I,  we  attempted  in  Phase  II  to  refine  the 
standard  setting  methods  to  yield  better  agreement  among  the 
judges  and  greater  convergence  across  methods.  The  standard 


Table  1.5 

Methods  of  Judging  Implied  Percent  of  Soldiers  Performing  at  Each 
Level 


Percent  Percent 


MOS 

Performance 

Dimension 

Method 

N 

Unacceptable 
Mean  SD 

Outstanding 
Mean  SD 

11B 

General 

Soldier 

80 

8.0 

5.3 

12.4 

9.6 

Soldiering 

Task 

81 

21.0 

14.9 

7.7 

9.4 

Incident 

80 

6.3 

13.3 

11.6 

15.0 

63B 

General 

Soldier 

49 

8.4 

6.9 

16.3 

18.6 

Soldiering 

Task 

50 

23.0 

14.6 

11.0 

12.1 

Basic 

Soldier 

49 

12.6 

12.8 

11.0 

10.5 

Maintenance 

Task 

50 

6.0 

7.4 

34.4 

20.8 

Incident 

49 

4.4 

16.3 

8.8 

12.6 

7 1L 

General 

Soldier 

47 

10.7 

10.5 

10.7 

9.7 

Soldiering 

Task 

51 

18.9 

12.6 

11.9 

11.6 

Typing 

Soldier 

47 

8.1 

5.5 

12.0 

13.8 

Task 

51 

35.7 

15.6 

7.3 

7.6 

Incident 

52 

10.8 

14.7 

9.2 

12.2 

Other 

Soldier 

47 

10.3 

13.0 

10.8 

14.4 

Clerical 

Task 

50 

35.7 

18.7 

8.0 

7.9 

Incident 

52 

4.6 

12.4 

4.8 

5.6 
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setting  instruments  used  in  Phase  I  (Soldier-Based,  Task-Based, 
and  Critical  Incident-Based)  were  also  used  in  Phase  II. 

However,  the  Task-Based  instrument  was  modified  to  incorporate 
three  judgmental  procedures .  The  Task-Hypothetical  Soldier 
(Task-HS)  method  required  SMEs  to  rate  the  acceptability  of  10 
hypothetical  soldiers  based  on  an  examination  of  hands-on  test 
data  for  those  soldiers.  The  Task-Detailed  Percent  GO  (Task-DPG) 
method  was  an  extension  of  the  Task-HS  procedure  and  required 
raters  to  identify  minimum  percent  GO  scores  for  each  performance 
level  based  on  an  examination  of  hands-on  test  data  for  the 
hypothetical  soldiers.  Finally,  the  Task-Abbreviated  Percent  GO 
(Task-APG)  method  asked  raters  to  set  percent  GO  cutoff  scores 
without  examining  any  data.  For  the  basic  standard  setting 
instruments  (Soldier-Based,  Task-Based,  and  Critical  Incident- 
Based)  and  the  Task-Based  formats  (Task-HS,  Task-DPG,  and  Task- 
APG),  we  examined  the  reliability  and  congruence  of  standards  set 
by  different  types  of  judges  (NCO  vs.  Officer,  FORSCOM  vs. 

TRADOC)  and  the  effects  of  a  group  discussion  on  standards. 

In  keeping  with  the  goal  to  attempt  to  set  standards  on 
components  of  the  job,  standards  were  set  on  job  dimensions  as 
defined  by  the  Hybrid  job  descriptor  instrument.  Depending  on 
the  standard  setting  instrument  used,  SMEs  set  standards  on  as 
many  as  seven  dimensions  (Soldier-Based)  or  as  few  as  two 
dimensions  (Task-Based).  For  each  standard  setting  procedure. 
Table  1 . 6  presents  the  average  percentage  of  soldiers  performing 
at  the  Unacceptable  and  Outstanding  levels  across  all  applicable 
dimensions  for  each  MOS.  As  in  Phase  I,  the  Task-Based  formats 
resulted  in  the  most  stringent  standards  although  there  was  a 
good  deal  of  variability  in  their  judgments  among  judges  in 
setting  standards . 

While  performance  on  some  dimensions  appears  to  be  more 
influential  than  performance  on  other  dimensions,  the  linear 
compensatory  model  aggregation  strategy  used  by  SMEs  in  Phase  I 
was  replicated  in  Phase  II. 

Phase  III  Objectives 

Job  description.  Having  identified  the  Task  Category  model 
as  the  prototypical  job  descriptor  instrument,  the  primary  job 
description  goals  of  Phase  III  were  to  collect  data  on  a  broader 
array  of  MOS  than  had  been  sampled  in  the  earlier  phases  and  to 
more  fully  investigate  various  methods  for  creating  predictor 
equations.  Additional  MOS  included:  12B  (Combat  Engineer),  13B 
(Cannon  Crewman),  27E  (TOW/Dragon  Repairer),  29E  (Radio 
Repairer),  31C  (Single  Channel  Radio  Operator),  51B  (Carpentry 
and  Masonry  Specialist),  54B  (Chemical  Operations  Specialist), 

55B  (Ammunition  Specialist),  95B  (Military  Police),  96B 
(Intelligence  Analyst)  from  Project  A,  and  31D  (Mobile  Subscriber 
Equipment  Transmission  System  Operator),  a  new  MOS  not  included 
in  Project  A.  Supplementary  goals  were  (a)  to  replicate  previous 
reliability  examinations  with  a  more  thorough  investigation  of 
differences  among  the  rank  (NCO,  Officer,  and  Civilian)  and 


1-23 


T 


Table  1.6 

Methods  of  Judging  Implied  Percent  of  Soldiers  Performing  at  Each 
Level 


Percent  Percent 

Performance  Unacceptable  Outstanding 


MOS 

Dimension* 

Method 

N 

Mean 

SD 

Mean 

SD 

16S 

2,  3,  7 ,  8, 

11,  15,  18 

Soldier 

563 

15.0 

16.0 

17.0 

19.0 

2,  3,  7,  11, 

15,  18 

Incident 

426 

29.0 

18.0 

16.0 

18.0 

2,  15,  18 

Task-HS 

209 

38.0 

25.0 

6.0 

8.0 

2,  15,  18 

Task-OPG 

180 

44.0 

17.0 

6.0 

6.0 

2,  15 

Task-APG 

41 

39.0 

17.0 

8.0 

9.0 

19K 

2,  3,  7,  8 

11,  15,  18 

Soldier 

578 

9.0 

9.0 

9.0 

11.0 

2,  8,  15,  18 

Incident 

212 

24.0 

15.0 

23.0 

19.0 

2,  8,  15,  18 

Task-HS 

148 

40.0 

26.0 

11.0 

17.0 

2,  8,  15,  18 

Task-DPG 

121 

42.0 

25.0 

13.0 

17.0 

2,  8,  15,  18 

Task-APG 

56 

53.0 

21.0 

7.0 

7.0 

67N 

7,  8,  13,  15 

Soldier 

162 

13.0 

13.0 

12.0 

10.0 

8,  13,  17 

Incident 

156 

17.0 

13.0 

16.0 

14.0 

8,  15,  17 

Task-HS 

67 

31.0 

17.0 

13.0 

8.0 

8,  13,  17 

Task-DPG 

62 

30.0 

11.0 

14.0 

10.0 

8,  17 

Task-APG 

31 

39.0 

13.0 

15.0 

14.0 

76Y 

10,  11,  16, 

17,  19 

Soldier 

235 

21.0 

21.0 

16.0 

19.0 

10,  16,  17,  19 

Incident 

200 

18.0 

13.0 

17.0 

17.0 

16,  17,  19 

Task-HS 

75 

32.0 

18.0 

8.0 

8.0 

16,  17,  19 

Task-DPG 

71 

35.0 

15.0 

9.0 

10.0 

16,  19 

Task-APG 

44 

41.0 

21.0 

14.0 

19.0 

88M 

4,  8,  11,  15 

17 

Soldier 

250 

16.0 

15.0 

14.0 

13.0 

4,  8,  15,  17 

Incident 

208 

20.0 

14.0 

15.0 

16.0 

4,  8,  15,  17 

Task-HS 

102 

22.0 

17.0 

13.0 

11.0 

4,  8,  15,  17 

Task-DPG 

95 

33.0 

15.0 

13.0 

12.0 

4,  8,  15,  17 

Task  APG 

74 

44.0 

14.0 

9.0 

6.0 

(table  continues) 


1-24 


Table  1.6  (continued) 


Percent 

Percent 

Performance 

Unacceptable 

Outstanding 

MOS 

Dimension 

Method 

N 

Mean 

SD 

Mean 

SD 

91A 

5,  17,  18, 

19 

Soldier 

342 

21.0 

15.0 

18.0 

13.0 

22,  24 

5,  18,  19, 

22 

Incident 

228 

22.0 

16.0 

22.0 

21.0 

8,  17,  18, 
22 

19 

Task-HS 

146 

35.0 

25.0 

7.0 

8.0 

5,  17,  18, 
22 

19 

Task-DPG 

134 

40.0 

21.0 

8.0 

7.0 

5,  17,  18, 
22 

18 

Task-APG 

109 

41.0 

18.0 

8.0 

8.0 

94B 

11,  13,  23 

Soldier 

129 

14.0 

12.0 

11.0 

10.0 

11,  13 

Incident 

88 

26.0 

20.0 

20.0 

18.0 

Note .  94B  SMEs  were  not  administered  any  of  the  Task-Based 

protocols  because  MOS-specific  hands-on  data  is  not  available  for 
the  dimensions  appropriate  for  this  MOS. 


“Performance  Dimensions  are  as  : 

2  =  Crew-Served  Weapons 

3  =  Tactical  Movements 

4  =  Navigate 

5  =  First  Aid 

7  =  Detect  Targets 

8  =  Repair  Mechanical  Systems 

10  =  Use  Technical  References23 

11  =  Pack  and  Load 
13  =  Operate/Install 


15  =  Operate  Vehicles 

16  =  Type 

17  =  Record  Keeping 

18  =  Oral  Communication 

19  =  Written  Communication 
22  =  Medical  Treatment 

=  Food  Preparation 
24  =  Leadership 


command  (TRADOC  vs.  operational  units)  rater  groups  and  (b)  to 
replicate  examination  of  the  convergence  among  Task  Questionnaire 
rating  scales. 

Standard  setting.  Goals  for  standard  setting  included 
refining  both  the  Behavioral  Incident  and  Task-Based 
questionnaires  based  on  Phase  II  results.  The  changes  were 
intended  to  simplify  data  collection  procedures  and  to  increase 
reliability.  Instruments  were  also  altered  such  that  standard 
setting  dimensions  were  derived  from  the  Task  Questionnaire, 
rather  than  the  abandoned  Hybrid  Questionnaire.  The  revised 
instruments  required  (a)  assessments  of  reliability  within  and 
across  rater  groups,  (b)  examination  of  the  generalizability  of 
standards  across  dimensions,  and  (c)  examination  of  the 
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differences  among  MOS  in  standards.  In  addition,  Phase  III 
standard  setting  attempted  the  linkage  of  performance  standards 
to  selection  standards. 


Summary 

The  Synthetic  Validity  Project  is  attempting  to  develop 
methods  for  addressing  two  of  the  Army's  most  critical  personnel 
management  problems:  (a)  What  information  should  be  used  to 
select  and  classify  people  into  an  MOS  when  empirical  validation 
data  are  not  available?  and  (b)  How  should  selection  standards  be 
set  so  as  to  optimize  the  Army's  goals  and  promote  fairness  and 
equity  for  all  applicants?  The  investigation  of  both  synthetic 
validation  and  standard  setting  proceeded  through  three  iterative 
phases.  Each  phase  considered  additional  measurement  issues  and 
expanded  the  sample  of  jobs.  The  remainder  of  this  report  will 
discuss,  in  detail,  the  results  of  Phase  III,  highlighting  how 
those  results  compare  with  Phases  I  and  II.  Based  on  the 
evaluations  conducted  as  part  of  this  project,  recommendations 
will  be  made  for  further  research  and  for  the  "method  of  choice" 
if  either  synthetic  validation  or  standard  setting  were  to  be 
carried  out  tomorrow. 
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Chapter  2:  Method  and  Procedures 
Cynthia  K.  Owens-Kurtz  and  Janis  S.  Houston  (PDRII) 


Phase  III  data  collection  workshops  were  conducted  from  the 
end  of  January  to  mid-May  1990  at  10  Army  installations 
throughout  the  United  States.  These  workshops  were  four  hours  in 
length  and  ranged  in  size  from  3  to  18  participants,  with  an 
average  group  size  of  approximately  11.  Separate  workshops  were 
held  for  NCOs  and  Officers  in  most  cases,  with  Civilians 
attending  whichever  session  was  most  appropriate.  Except  in  rare 
instances,  workshops  were  held  separately  by  MOS. 

Description  of  Sample 
General  Description  of  Sample 

During  Army  Project  A,  the  jobs  studied  were  divided  into 
two  subsets,  referred  to  as  "Batch  A"  and  "Batch  Z"  MOS.  For 
Batch  A  MOS,  Task-Based  tests  and  job-specific  rating  scales  were 
developed.  These  measures  provided  useful  input  to  the  Synthetic 
Validity  Project  research  instruments  as  described  in  earlier 
reports  (Peterson,  Owens-Kurtz,  Hoffman,  Arabian,  &  Whetzel, 

1990;  Wise,  Arabian,  Chia,  &  Szenas,  1989.)  For  Batch  Z  MOS, 
job-specific  performance  measures  were  limited  to  school-based 
knowledge  tests . 

In  the  Synthetic  Validity  Phase  III  workshops,  11  MOS  were 
studied.  Three  of  these  were  Project  A  Batch  A  MOS: 

•  13B  Cannon  Crewman, 

•  3 1C  Single  Channel  Radio  Operator,  and 

•  95B  Military  Police. 

Of  the  remaining  MOS  studied,  seven  were  Project  A  Batch  Z 

MOS : 

•  12B  Combat  Engineer, 

•  27E  TOW/Dragon  Repairer, 

•  29E  Radio  Repairer, 

•  51B  Carpentry  and  Masonry  Specialist, 

•  54B  Chemical  Operations  Specialist, 

•  55B  Ammunition  Specialist,  and 

•  96B  Intelligence  Analyst. 
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Two  of  these  MOS,  29E  and  96B,  were  added  to  Project  A  after  the 
1985  Concurrent  Validation  (CV)  data  collection.  Thus,  we 
excluded  29E  and  96B  from  analyses  that  relied  on  CV  data  (see 
Chapter  4 ) . 

One  new  MOS,  31D  Mobile  Subscriber  Equipment  Transmission 
System  Operator,  was  recently  added  to  the  Army  enlisted  MOS  and 
was,  therefore,  not  part  of  Project  A.  Because  31D  has  no 
"history, "  it  will  serve  as  a  test  of  the  accuracy  and 
completeness  of  the  research  instruments  and  procedures  for  MOS 
outside  Project  A  and  for  new  MOS. 

Two  populations  of  judges  participated  in  the  Phase  III  data 
collection.  The  first  included  NCOs,  Officers,  and  Civilians  at 
TRADOC  sites  who  help  define  doctrine  and  prepare  training  plans 
for  each  MOS.  The  second  included  NCOs  and  Officers  at  FORSCOM 
sites  who  supervise  or  have  responsibility  for  first-term 
soldiers  in  these  MOS. 

The  six  TRADOC  data  collection  sites  were: 

•  Ft.  McClellan  (54B,  95B), 

•  Ft.  Sill  (13B), 

•  Ft.  Leonard  Wood  (12B,  51B), 

•  Ft.  Gordon  (29E,  31C,  31D) , 

•  Ft.  Huachuca  (96B),  and 

•  Redstone  Arsenal  (27E,  55B). 

The  four  FORSCOM  sites  were: 

•  Ft.  Bragg  (27E,  29E,  54B,  55B,  96B), 

•  Ft.  Riley  (27E,  29E,  51B,  55B,  96B), 

•  Ft.  Campbell  (12B,  13B,  31C,  54B,  95B),  and 

•  Ft.  Shafter1  (12B,  13B,  31C,  51B,  95B) . 

Note  that  31D  workshop  participants  were  available  only  at  Ft. 
Gordon,  a  TRADOC  site. 


xFt.  Shafter  is  technically  a  WESTCOM  site.  Throughout  this 
report  it  is  treated  as  a  FORSCOM  site  because  it  is  an 
operational  TO&E  unit. 
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Sample  Sizes  by  MOS,  Rank,  and  Command 

A  total  of  930  personnel  were  requested  for  the  Phase  III 
data  collection.  Of  this  number,  687  (74%)  participated. 

Table  2.1  presents  the  sample  sizes  requested  and  received  by  1-ICS 
and  site  for  TRADOC  sites.  Table  2.2  presents  the  sample  sizes 
for  FORSCOM  sites .  Some  of  the  MCS  studied  are  very  low  density 
at  the  sites  available,  and  some  personnel  were  unavailable  due 
to  world  events.  (A  number  of  personnel  in  several  of  our  target 
MOS  were  in  Panama  during  our  data  collection.)  Included  in 
Tables  2 . 1  and  2 . 2  are  the  number  of  participants  we  expected  at 
each  site,  based  on  estimates  from  our  on-site  points  of  contact 
(POC)  prior  to  the  data  collection.  We  received  687  of  a 
projected  739  participants,  or  92%. 

Table  2 . 3  presents  the  total  number  of  participants  for  each 
MOS  by  rank  and  command.  The  mean  total  sample  size  for  the  10 
MOS  studied  at  both  TRADOC  and  FORSCOM  sites  (excluding  31D)  is 
67,  with  a  range  of  34  to  81.  Seventeen  individuals  participated 
in  the  31D  workshops. 

Demographics  of  Sample 

Tables  2.4  to  2.7  display  the  demographics  of  workshop 
participants,  separately  for  each  MOS.  The  variables  included  in 
these  tables  are  race,  gender,  pay  grade,  and  military  and  MOS 
experience . 

Table  2 . 4  presents  the  racial  distribution  of  the  judges . 

The  participants  were  primarily  White  (67%)  or  African-American 
(25%).  The  majority  of  participants  (89%)  were  male  as  shown  in 
Table  2.5. 

The  pay  grade  by  MOS  breakdown  appears  in  Table  2.5. 

Although  we  initially  requested  only  soldiers  in  pay  grades  E6  to 
E9  for  NCO  participants,  and  02  to  04  for  Officers,  we  reduced 
this  constraint  when  we  learned  how  few  NCOs  and  Officers  at  this 
level  were  available  at  some  sites .  We  were  assured  by  the  POCs 
that  all  personnel  in  grades  lower  than  E6  or  02  that  were  tasked 
to  attend  the  workshops  were  very  knowledgeable  in  the  target 
MOS . 

The  Army  and  MOS-specific  experience  of  the  workshop 
participants  are  presented  in  Table  2.7.  Because  31D  is  a 
relatively  new  MOS,  the  median  MOS  experience  for  31D  judges  was 
low  (1.0  years).  Overall,  median  Army  experience  was  9.8  years, 
and  median  MOS  experience  was  6.3  years.  We  grouped  the 
participants  into  four  MOS  experience  groups:  0  to  1  year,  1 
year  and  1  month  to  3  years,  3  years  and  1  month  to  6  years,  and 
greater  than  6  years.  The  cross-tabulation  of  these  groups  by 
MOS  appears  in  Table  2.8.  For  each  MOS  experience  category,  the 
median  Army  experience  (in  years)  across  MOS  is  also  presented  in 
Table  2.8. 
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Phase  III  Sample:  Numbers  Requested,  Expected,  and  Received  at  TRADOC  Sites  by  MOS  and  Rank 
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Ill  Sample:  Numbers  Requested,  Expected,  and  Received  at  FORSCOM  Sites  by  MOS  and  Rank 
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Table  2.3 


Phase  III  Sample:  MOS  by  Command  by  Rank 


FORSCOM _  _ TRADOC _  _ TOTAL _ 

MOS  NCO  OFF  CIV  Total  NCO  OFF  CIV  Total  NCO  OFF  CIV  Total 


12B 

31 

23 

0 

54 

12 

13 

2 

27 

43 

36 

2 

81 

13B 

29 

24 

0 

53 

8 

6 

6 

20 

37 

30 

6 

73 

27E 

15 

3 

0 

18 

13 

0 

3 

16 

28 

3 

3 

34 

29E 

28 

6 

0 

34 

12 

2 

3 

17 

40 

8 

3 

51 

31C 

27 

22 

0 

49 

15 

12 

1 

28 

42 

34 

1 

77 

31D 

0 

0 

0 

0 

15 

1 

1 

17 

15 

1 

1 

17 

5  IB 

30 

21 

0 

51 

12 

11 

6 

29 

42 

32 

6 

80 

54B 

27 

16 

0 

43 

14 

13 

2 

29 

41 

29 

2 

72 

55B 

31 

8 

0 

39 

13 

5 

6 

24 

44 

13 

6 

63 

95B 

29 

21 

0 

50 

15 

11 

0 

26 

44 

32 

0 

76 

96B 

27 

12 

1 

40 

18 

0 

5 

23 

45 

12 

6 

63 

TOTAL 

274 

156 

1 

431 

147 

74 

35 

256 

421 

230 

36 

687 
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Table  2 . 4 


Phase  III  Sample  Demographics:  Race 


MOS 

Black/Af rican- 
American 

American 

Indian 

Hispanic 

White 

Ocher 

Blank 

TOTAL 

12B 

10 

1 

2 

66 

2 

0 

81 

13B 

10 

0 

5 

55 

3 

0 

73 

27E 

8 

0 

4 

21 

1 

0 

34 

29E 

12 

0 

1 

37 

1 

0 

51 

31C 

37 

0 

5 

33 

2 

0 

77 

3  ID 

9 

0 

0 

6 

1 

1 

17 

5  IB 

15 

0 

4 

57 

4 

0 

80 

54B 

19 

0 

3 

48 

1 

1 

72 

55B 

27 

0 

3 

33 

0 

0 

63 

95B 

10 

2 

2 

60 

2 

0 

76 

96B 

13 

1 

4 

45 

0 

0 

63 

TOTAL 

170 

4 

33 

461 

17 

2 

687 

PERCENT  25% 

<1% 

5% 

67% 

2% 

<1% 

Table  2.5 

Phase  III  Sample  Demographics:  Gender 


MOS 

Male 

Female 

Unknown 

TOTAL 

12B 

80 

1 

0 

81 

13B 

72 

1 

0 

73 

27E 

32 

2 

0 

34 

29E 

47 

4 

0 

51 

31C 

61 

15 

1 

77 

31D 

13 

4 

0 

17 

51B 

73 

7 

0 

80 

54B 

67 

3 

2 

72 

55B 

56 

7 

0 

63 

95B 

63 

13 

0 

76 

96B 

46 

17 

0 

63 

TOTAL 

610 

74 

3 

687 

PERCENT 

89% 

11% 

<1% 
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Table  2.6 

Phase  III  Sample  Demographics:  Pay  Grade 


Pay  Grade _ 

GS9- 


MOS 

E3 

E4 

E5 

E6-E9 

W1-W3 

01 

02-04 

GS12 

Unknown 

TOTA 

12B 

0 

3 

12 

28 

0 

8 

28 

2 

0 

81 

13B 

0 

0 

4 

33 

0 

1 

29 

5 

1 

73 

27E 

0 

0 

11 

17 

1 

1 

1 

3 

0 

34 

29E 

2 

5 

18 

15 

1 

2 

5 

3 

0 

51 

31C 

0 

1 

14 

27 

0 

7 

27 

1 

0 

77 

31D 

0 

9 

5 

1 

0 

1 

0 

1 

0 

17 

51B 

0 

3 

10 

29 

0 

9 

23 

6 

0 

80 

54B 

0 

0 

8 

32 

0 

2 

26 

2 

2 

72 

55B 

0 

0 

25 

19 

5 

3 

5 

6 

0 

63 

95B 

0 

0 

2 

42 

0 

4 

28 

0 

0 

76 

96B 

2 

1 

20 

22 

4 

1 

7 

6 

0 

63 

TOTAL 

4 

22 

129 

265 

11 

39 

179 

35 

3 

687 

PERCENT 

<1% 

3% 

19% 

39% 

2% 

6% 

26% 

5% 

<1% 

Note .  Percentages  sum  to  >100  due  to  rounding. 
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Table  2.7 


Phase  III  Sample  Demographics:  Total  Army  and  MOS  Experience  in 
Years 


MOS 

Army 

Experience 

MOS 

Experience 

N 

Mdn 

MIN 

MAX 

N 

Mdn 

MIN 

MAX 

12B 

80 

7.8 

0.8 

25.0 

81 

5.3 

0.1 

25.0 

13B 

72 

11.9 

2.0 

40.0 

71 

8.9 

0.0 

29.2 

27E 

34 

11.8 

2.4 

37.8 

34 

9.1 

0.1 

15.0 

29E 

51 

8.6 

1.1 

25.1 

51 

4.3 

0.0 

19.3 

31C 

76 

8.8 

1.0 

30.3 

75 

6.0 

0.0 

23.0 

31D 

17 

6.7 

3.0 

29.3 

17 

1.0 

0.3 

2.2 

51B 

80 

10.1 

0.8 

30.3 

80 

5.0 

0.0 

30.0 

54B 

71 

10.6 

1.3 

32.8 

71 

6.2 

0.8 

12.0 

55B 

63 

10.0 

1.3 

35.0 

63 

6.3 

0.8 

35.0 

95B 

76 

11.5 

0.5 

22.6 

76 

10.2 

0.5 

22.6 

96B 

63 

10.1 

0.8 

32.3 

62 

5.1 

0.3 

20.0 

Overall 

683 

9.8 

0.5 

40.0 

681 

6.3 

0.0 

35.0 
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Table  2.8 


MOS  Experience  Categories  Frequencies  by  MOS 


_ MOS  Experience  Category _ 

0-1  >1-3  >3-6  >6 

MOS  Year  Years  Years  Years  Unknown  TOTAL 


12B 

8 

14 

22 

37 

0 

81 

13B 

3 

12 

10 

46 

2 

73 

27E 

2 

2 

3 

27 

0 

34 

29E 

9 

11 

14 

17 

0 

51 

31C 

12 

13 

15 

35 

2 

77 

31D 

11 

6 

0 

0 

0 

17 

51B 

14 

19 

11 

36 

0 

80 

54B 

1 

11 

23 

36 

1 

72 

55B 

4 

6 

19 

34 

0 

63 

95B 

3 

3 

11 

59 

0 

76 

96B 

5 

14 

18 

25 

1 

63 

TOTAL 

72 

111 

146 

352 

6 

687 

Army 

Experience :  Mdn 

4.1 

4.0 

6.9“ 

13.4“ 

6 . 6b 

“One  case  excluded  from  median  due  to  missing  Army  experience  data. 
bTwo  cases  excluded  from  median  due  to  missing  Army  experience 
data . 


2-13 


Workshop  Procedures  and  Instruments 


Overview  of  Workshop  Procedures 

At  the  beginning  of  each  four  hour  workshop ,  participants 
were  provided  with  an  overview  of  the  Synthetic  Validation 
Project  and  briefed  on  the  schedule  of  the  day's  activities.  A 
Privacy  Act  statement  was  distributed  and  read,  and  a  Background 
Information  sheet  was  completed  by  each  participant. 

Participants  were  given  an  opportunity  to  ask  questions  about  the 
project  and  the  workshop. 

Participants  then  completed  a  job  description  instrument, 
the  Army  Task  Questionnaire.  Next,  two  standard  setting 
exercises  were  administered:  the  Behavioral  Incident  Standard 
Setting  Questionnaire  and  the  Task-Based  Standard  Setting  Form. 
The  order  of  administration  of  the  standard  setting  exercises  was 
varied  across  workshops.  Finally,  participants  completed  an 
instrument  designed  to  assess  the  complexity  of  MOS  tasks,  the 
Task  Complexity  Questionnaire. 

Brief  descriptions  and  samples  of  these  instruments/ 
exercises  appear  below.  A  detailed  description  of  each 
instrument  appears  in  the  relevant  chapter  of  this  report. 
Appendix  A  contains  a  copy  of  all  workshop  instruments  and 
instructions  for  a  single  MOS,  12B,  and  Volume  II  of  this  report 
contains  all  instruments  for  all  MOS. 

Army  Task  Questionnaire 

The  Army  Task  Questionnaire  contained  96  task  categories  and 
required  participants  to  make  *ive  ratings  for  each:  Frequency, 
Importance  for  Core  Technical  • -of iciency , 2  Importance  for 
General  Soldiering  Proficiency,'1  Importance  for  Overall  Job 
Performance,4  and  Difficulty.  (See  Figure  2.1  for  the 
instructions  and  example.)  First,  the  relative  frequency  of  each 
of  the  96  tasks  was  rated.  After  completing  the  Frequency 


2Core  Technical  Proficiency  is  made  up  of  the  tasks  that  are 
"central"  to  the  MOS.  The  tasks  represent  the  core  of  the  job 
and  are  the  primary  definers  of  the  MOS.' 

individuals  in  every  MOS  are  responsible  for  being  able  to 
perform  a  variety  of  general  soldiering  tasks .  These  are 
referred  to  as  "Common  Tasks."  General  Soldiering  Proficiency 
refers  to  all  Common  Tasks. 

‘Overall  Job  Performance  refers  to  all  areas  of  job 
performance,  including  Core  Technical  and  General  Soldiering 
Proficiency.  This  is  total  job  performance. 

difficulty  refers  to  how  difficult  it  is  to  reach  and 
maintain  an  acceptable  level  of  proficiency  in  the  task. 
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ratings  for  all  96  task  categories,  participants  made  the  three 
Importance  ratings  and  the  Difficulty  rating  for  all  tasks  with  a 
Frequency  rating  greater  than  0.  When  making  their  ratings, 
participants  were  instructed  to  consider  soldiers  with  24  months 
of  service  in  the  target  MOS  after  Basic  and  Advanced  Individual 
Training  (AIT)  and  to  consider  the  full  range  of  duty  assignments 
for  the  target  MOS.  After  ratings  were  completed  for  the  96 
tasks,  participants  were  asked  to  indicate  the  percentage  of  the 
target  MOS  covered  by  the  task  categories.  Finally,  participants 
were  asked  to  list  any  task  categories  that  should  be  added  to 
the  questionnaire.  Most  participants  completed  this 
questionnaire  within  60  minutes.  The  Army  Task  Questionnaire  is 
described  in  detail  in  Chapter  3  of  this  report;  a  copy  of  the 
entire  instrument  appears  in  Appendix  A. 

Standard  Setting  Exercises 

Two  standard  setting  exercises  were  conducted.  The  order  of 
their  administration,  shown  in  Table  2.9,  varied  across 
workshops.  In  some  instances,  such  as  individuals  arriving  late 
or  make-up  sessions,  the  administration  order  differed  from 
Table  2.9.  In  all  cases,  the  standard  setting  exercises  were 
administered  after  the  Army  Task  Questionnaire. 

In  both  exercises,  participants  were  asked  to  set  standards 
for  two  or  three  task  areas  relevant  to  the  target  MOS.  Figure 
2.2  defines  each  task  area  and  Table  2.10  shows  the  task  areas 
rated  by  each  MOS.  The  following  performance  level  definitions 
were  used  to  set  the  standards : 

•  Unacceptable:  Soldiers  who  consistently  perform  like 
this  should  not  have  been  selected  for  this  MOS.  Their 
performance  is  hurting  the  Army.  Additional  training 
would  not  bring  their  performance  up  to  acceptable 
levels . 

•  Marginal:  Soldiers  who  consistently  perform  like  this 
need  extra  or  remedial  training.  Their  current 
performance  is  of  little  or  no  benefit  to  the  Army. 

•  Acceptable:  Soldiers  who  consistently  perform  like  this 
are  doing  an  adequate  job.  They  are  making  positive 
contributions  to  the  Army. 

•  Outstanding:  Soldiers  who  consistently  perform  like  this 
are  doing  extremely  well.  They  are  making  exceptional 
contributions  to  the  Army  and  are  good  examples  to  other 
soldiers . 
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ARMY  TASK  QUESTIONNAIRE 
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Please  look  at  the  EXAMPLES  below  and  read  through  their  explanations  before  starling  to  make  your  ratings. 
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Phase  III  Administration  Order  of  Standard  Setting  Exercises 
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(table  continues! 
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Note.  F  =  FORSCOM;  T  =  TRADOC;  BI  =  Behavioral  Incident  Instrument;  TB  =  Task-Based 
Instrument. 


B.  Electrical  and  Electronic  Systems  Maintenances 

Inspect,  install,  maintain,  or  repair  electrical  or 
electronic  equipment. 

D.  Vehicle  and  Equipment  Operations: 

Drive  or  operate  heavy  mechanical  equipment. 

H .  Clerical : 

Type;  follow  standard  procedures  to  complete  forms,  copy, 
file,  and  retrieve  information;  distribute,  inspect, 
store,  and  ship  materials. 


I .  Communication : 

Give  and  receive  information  using  oral,  written,  and 
hand/arm  signals.  Read  manuals,  publications,  maps,  etc. 
Provide  counseling. 

M.  Individual  Combat: 

Engage  in  combat  and  survival  skills;  know  customs  and 
laws  of  war. 

N .  Crew-served  Weapons : 

Operate  and  fire  direct  and  indirect  crew-served  weapons . 


Figure  2.2.  Definitions  of  task  areas  used  in  standard  setting 
exercises . 
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Table  2.10 


Task  Areas  Used 

for  Standard 

Setting 

Exercises 

for 

Each  MOS 

MOS 

B  D 

H 

I 

M 

N 

12B 

X 

X 

13B 

X 

X 

X 

27E 

X 

X 

X 

29E 

X 

X 

X 

31C 

X 

X 

X 

31D 

X 

X 

X 

51B 

X 

X 

54B 

X 

X 

55B 

X 

X 

X 

95B 

X 

X 

X 

96B 

X 

X 

X 

Behavioral  Incident  Standard  Setting  Questionnaire.  In  the 
Behavioral  Incident  Standard  Setting  Questionnaire,  participants 
were  provided  with  20  behavioral  incidents  in  each  of  the  two  or 
three  task  areas  relevant  to  the  target  MOS.  (See  Figure  2.3  for 
the  instructions  and  example.)  The  incidents  came  from  several 
MOS  and  were  sampled  to  ensure  coverage  of  the  task  areas  and 
levels  of  performance.  For  incidents  involving  tasks  not 
performed  in  the  target  MOS,  participants  were  instructed  to 
think  of  similar  tasks  performed  in  the 'target  MOS.  Participants 
rated  each  incident  as  indicative  of  Unacceptable,  Marginal, 
Acceptable,  or  Outstanding  performance.  A  "Cannot  Rate"  option 
was  also  provided.  Most  participants  completed  this 
questionnaire  within  30  minutes.  The  Behavioral  Incident 
Standard  Setting  Questionnaire  is  described  in  detail  in 
Chapter  5  of  this  report,  and  a  full  copy  of  it  appears  in 
Appendix  A. 

Task-Based  Standard  Setting  Form.  The  Task-Based  Standard 
Setting  Form  provided  three  sample  tasks  for  each  of  the  two  or 
three  task  areas  relevant  to  the  target  MOS.  (See  Figure  2.4  for 
the  instructions  and  example.)  Participants  decided  as  a  group 
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Behavioral  Incident  Standard  Setting  Questionnaire 
12B  -  COMBAT  ENGINEER 


Name: 


In  this  section  of  the  workshop  we  would  like  you  to  help  us  set  job  performance 
standards  on  two  or  three  broad  performance  areas  that  apply  to  the  MOS  that  you  are  rating. 
For  each  area,  twenty  behavioral  incidents,  or  examples  of  performance,  have  been  provided 
by  other  SMEs  as  samples  of  the  types  of  behaviors  that  fit  each  area.  These  examples 
come  from  a  number  of  different  MOS  and  they  vary  in  level  of  effectiveness.  Thus,  some 
incidents  illustrate  poor  performance  and  some  illustrate  good  performance,  but  they  all 
illustrate  performance  within  a  particular  type  of  job  behavior. 

For  each  area,  read  the  definition  and  think  of  similar  types  of  tasks  that  are  performed  in 
the  MOS  that  you  are  rating.  Then  for  each  behavioral  incident  ask  yourself  the  following 
question: 

If  a  soldier  CONSISTENTLY  performed  duties  in  this  area  at  a  level  of 
effectiveness  like  the  example  incident,  what  kind  of  soldier  would  this  be? 

Refer  to  the  one-page  handout  containing  the  definitions  of  Unacceptable,  Marginal, 

Acceptable,  and  Outstanding  performance  to  guide  you  as  you  make  your  ratings.  Make  your 
ratings  by  thinking  of  similar  types  of  incidents  for  your  MOS.  Circle  the  letter  that  matches 
that  level  of  effectiveness  of  incident.  If  any  incident  is  so  unfamiliar  that  you  cannot  decide 
what  level  of  performance  effectiveness  it  represents,  than  circle  CNR  for  "cannot  rate." 

Please  make  sure  that  you  circle  only  one  response  for  each  example. 


Remember:  As  you  make  your  ratings,  think  about  soldiers  who  have 
about  24  months  of  service  fn  this  MGS  after  Basic  and  AIT,  Also  keep 
in  mind  al 1  that  you  know  about  the  full  range  of  duty  assignments  for 
this  MOS.  - 


Example:  : 

Demonstrate  Leadership  —  Demonstrate  leadership  and  maturity;  Act  as;  a  model,  give 
direction  and  instruction,  to  peers:  support  peers  and/or  providfe  informal  counseling;  and  promote 
a  positive  public  image  of  the  military. 

1.  This  soldier  spent  many  duty  and  non-duty  hours. 

learning  his  new  MOS.  In  a  few  months,  he  was  tops  in.  his 
MOS  and  was  selected  as  the  first  E-4  to  evaluate  other 
soldiers  in  the  MOS. 

77**  rarer  read  the  definition  of  Demonstrate  Leadership  and  the  example  and  decided  that  a  soldier 
who  consistently  performed  like  this  example  would  be  demonstrating  outstanding  leadership. 

Therefore,  the  rater  circled  the  rO"  for  Outstanding. 


Figure  2.3.  Instructions  and  example  from  the  Behavioral 
Incident  Standard  Setting  Questionnaire. 
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Task-Based  Standard  Setting  Exercise 
Instructions  and  EXAMPLE 

In  this  exercise,  we  would  like  you  to  help  us  set  standards  for  performance  in  two  or  three  fairly  general 
areas.  These  areas  could  apply  to  more  than  one  MOS;  some  examples  are  individual  Combat,  Vehicle 
and  Equipment  Operation,  and  Communication. 

There  are  two  major  steps  that  will  be  completed  for  each  task  area.  The  first  step  involves  group  partici¬ 
pation,  while  the  second  step  is  completed  individually.  Refer  to  the  EXAMPLE  on  the  next  page  as  you 
read  through  the  steps  below. 

Step  1.  Read  the  Task  Area  Definition  and  the  Sample  Tasks  listed  there.  Under  the  "Yes/No"  column, 
circle  "Y*  If  you  think  the  Sample  Task  is  performed  in  the  MOS  you  are  rating;  circle  "N"  If  you 
think  it  is  not  peformed  in  this  MOS.  If  you  circle  ’N,"  try  to  think  of  a  task  that  is  performed  in 
this  MOS  that  is  similar  to  the  Sample  Task  in  terms  of  the  type  of  operations  or  steps  involved, 
the  kinds  of  skills  required,  and  the  degree  of  difficulty  in  performing  the  task.  However,  do  not 
write  your  'substitute*  task  down  yet. 

After  everyone  has  completed  this  part  of  the  step,  we  will  discuss  possible  substitute  tasks  (or 
the  group  may  decide  that  the  Sample  Task  really  does  occur).  After  this  discussion,  a  con¬ 
sensus  will  be  reached  about  the  best  substitute  tasks,  and  these  will  be  written  on  the  appro¬ 
priate  lines. 

Look  at  the  EXAMPLE  A  group  of  63B  agreed  that  'Replace  transmission  rotor  hub  assembly' 
was  not  performed  in  their  MOS,  and  they  reached  a  consensus,  after  discussion,  that  'Re¬ 
place  hydrovac  in  a  5-Ton“  was  similar  in  terms  of  operations  performed,  skills  required,  and 
degree  of  difficulty  in  performing.  The  group  did  think  the  other  two  Sample  Tasks  were  per¬ 
formed  in  the  63B  MOS.  so  tha  "Y*  Is  circled  for  those  two  tasks,  and  no  substitutes  appear. 

Step  2.  After  agreeing  on  Sample  Tasks  or  substitutes,  you  will  individually  complete  the  second  major 
step,  judging  what  should  be  the  test  score  cutoffs  on  these  tasks  in  order  to  be  viewed  as 
Marginal.  Acceptable,  or  Outstanding  performers  (using  the  Performance  Level  Definitions). 

To  help  make  judgments  for  the  second  step,  the  form  provides  information  about  actual  soldier 
performance  on  hands-on  tests  of  the  Sample  Tasks.  This  test-score  information  is  not  based 
on  SQT  scores,  where  soldiers  are  allowed  to  practice  repeatedly.  The  hands-on  test  scores 
referred  to  here  are  from  specially-developed  tests  that  were  given  with  no  advance  warning 
and  no  practice  allowed. 

Look  at  the  EXAMPLE  again.  In  the  EXAMPLE  34  out  of  100  soldiers  score  55  or  worse  on 
the  specially  developed  hands-on  tests  for  these  sample  tasks.  In  other  words,  34  out  of  100 
soldiers  could  correctly  perform  55%  or  fewer  of  the  steps  in  the  hands-on  tests. 

The  judge  in  this  example  decided  that  getting  less  than  55%  correct  on  these  tasks  was 
Unacceptable  and  drew  his  line  marking  the  Unacceptable  category  below  55.  He  felt  that 
scores  less  than  75  were  Marginal;  75  and  above  Acceptable.  Finally,  he  felt  that  scores  of  95 
and  bet. er  represent  Outstanding  performance.  Nine  out  of  100  soldiers  (100  minus  91)  would 
be  considered  outstanding  performers,  according  to  this  judge. 

PLEASE  put  your  name  and  the  MOS  you're  rating  in  the  spaces  provided  on  EVERY  page. 

NOTE:  As  you  make  your  ratings,  think  about  soldiers  who  have  about  24  months  of  service  in  this  MOS  after 
Basic  and  AIT.  Also  keep  in  mind  all  that  you  know  about  the  full  range  of  duty  assignments  for  this  MOS. 

Figure  2.4.  Instructions  and  example  from  the  Task-Based 
Standard  Setting  Form. 
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EXAMPLE 


Task-Based  Standard  Setting  Form 


A.  Mechanical  Systems  Maintenance:  Inspect,  install,  maintain,  or  repair  mechanical  systems. 


Sample  Tasks 

1.  Perform  operator  maintenance 
on  M16A1  rifle. 


Part  of  the  MOS? 
YES/NO 


Substitute  Tasks 


1. 


2.  Replace  transmission  in  rotor 
hub  assembly. 


3.  Replace  wheel  bearings. 


Actual  Hands-On  Test-Score  Information  for  these  Tasks: 


Test  Score 

Number  of  Soldiers  Who  Score 

OF  Steps  Correctly  Performed 

the  Same  or  Worse  Than  This 

n  100 

100  out  of  100  soldiers 

O  95 

91  out  of  100  soldiers 

90 

82  out  of  100  soldiers 

A  85 

73  out  of  100  soldiers 

ft  80 

63  out  of  100  soldiers 

75 

57  out  of  100  soldiers 

• - -  70 

51  out  of  100  soldiers 

kJ  65 

47  out  of  100  soldiers 

M  60 

42  out  of  100  soldiers 

55 

34  out  of  100  soldiers 

50 

26  out  of  100  soldiers 

U  45 

25  out  of  100  soldiers 

^  40 

24  out  of  100  soldiers 

35 

23  out  of  100  soldiers 

30 

21  out  of  100  soldiers 

25 

16  out  of  100  soldiers 

20 

11  out  of  100  soldiers 

15 

10  out  of  100  soldiers 

10 

9  out  of  100  soldiers 

DRAW  3  LINES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES. 

LABEL  THE  CATEGORIES:  0  (Ouststanding) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 

Figure  2.4.  Instructions  and  example  from  the  Task-Based 
Standard  Setting  Form  (continued). 
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whether  each  sample  task  was  relevant  to  the  target  MOS.  If  a 
task  was  irrelevant,  the  group  suggested  alternative  tasks 
similar  to  the  sample  task  in  terms  of  the  type  of  operations  or 
steps  involved,  the  kinds  of  skills  required,  and  the  degree  of 
difficulty  in  performing  the  task.  The  group  discussed  the 
alternatives ,  then  reached  a  consensus  about  the  substitute  task 
to  use. 

Actual  hands-on  test  score  information  from  Army  Project  A 
was  presented  on  the  form.  This  information  was  presented  in  two 
columns.  In  the  first  column,  test  scores  at  5-point  intervals 
were  listed.  The  second  column  presented  the  number  of  soldiers 
(out  of  100)  who  scored  at  or  below  the  adjacent  test  score  based 
on  Project  A  data.  Participants  were  instructed  to  draw  three 
lines  to  mark  the  cutoffs  between  the  four  performance  levels  and 
to  label  the  categories  0  for  Outstanding,  A  for  Acceptable,  M 
for  Marginal,  and  U  for  Unacceptable. 

After  cutoffs  had  been  drawn  for  the  two  or  three  task 
areas,  a  group  discussion  was  conducted  on  two  task  areas.  The 
task  areas  discussed  for  each  MOS  are  listed  in  Table  2.11. 
Individual  Combat  was  always  discussed  first,  followed  by  one 
MOS-specific  task  area.  The  workshop  leader  tallied  the  group's 
responses  on  a  chalkboard  and  pointed  out  the  effect  of 
implementing  the  group's  lowest,  median,  and  highest  standards, 
using  the  Project  A  test-score  data.  The  workshop  leader  then 
directed  a  discussion  of  the  ratings,  asking  participants  to 
state  specific  positive  or  negative  consequences  in  support  of 
their  cutoffs.  Following  the  discussion,  participants  were  asked 
to  rerate  the  task  areas  discussed.  The  complete  Task-Based 
Standard  Setting  exercise  with  discussion  and  rerates  took 
approximately  1  hour.  Chapter  5  describes  the  Task-Based 
Standard  Setting  exercise  in  detail;  a  copy  of  the  instructions, 
rating  forms,  and  group  discussion  script  appear  in  Appendix  A. 

Task  Complexity  Questionnaire 

The  final  instrument  completed  by  participants  was  the  Task 
Complexity  Questionnaire.  In  this  questionnaire,  participants 
responded  to  10  questions  about  a  sample  task  from  each  of  the 
two  areas  discussed  in  the  Task-Based  Standard  Setting  exercise. 
(See  Table  2.11  for  the  list  of  task  areas  rated  for  each  MOS  and 
Figure  2 . 5  for  the  instructions  and  sample  pages . )  If  a 
substitute  task  was  used  in  place  of  the  sample  task  in  the  Task- 
Based  Standard  Setting  exercise,  the  same  substitute  was  used  for 
the  Task  Complexity  Questionnaire.  The  10  questions,  designed  to 
assess  the  complexity  or  difficulty  of  the  sample  task,  were 
multiple  choice  and  covered  such  things  as  job  or  memory  aids  and 
mental  processing  requirements  of  the  task.  Most  participants 
completed  this  questionnaire  within  20  minutes.  The  Task 
Complexity  Questionnaire  is  discussed  in  detail  in  Chapter  6;  a 
copy  appears  in  Appendix  A. 
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Table  2.11 

List  of  Task  Areas  to  Discuss  by  MOS 

MOS  Task  Area  1  Task  Area  2 

12B  M.  Individual  Combat  D.  Vehicle  &  Equipment  Operations 

13B  M.  Individual  Combat  D.  Vehicle  &  Equipment  Operations 

27E  M.  Individual  Combat  B.  Electrical  &  Electronic  System 

Maintenance 

29E  M.  Individual  Combat  B.  Electrical  &  Electronic  System 

Maintenance 

3 1C  M.  Individual  Combat  I.  Communications 
31D  M.  Individual  Combat  I.  Communications 

5 IB  M.  Individual  Combat  D.  Vehicle  &  Equipment  Operations 

54B  M.  Individual  Combat  D.  Vehicle  &  Equipment  Operations 

55B  M.  Individual  Combat  H.  Clerical 

95B  M.  Individual  Combat  H.  Clerical 


96B 


M.  Individual  Combat 


I .  Communications 


Name:. 


Task  Complexity  Questionnaire 

12B:  Combat  Engineer 


In  this  exercise,  we  would  like  you  to  provide  information  about  the  complexity  or  difficulty  of 
sample  tasks  selected  from  two  fairly  general  areas.  These  areas  could  apply  to  more  than  one 
MOS;  some  examples  are  Individual  Combat,  Vehicle  and  Equipment  Operation,  and  Commu¬ 
nication. 

For  each  of  the  two  tasks  presented,  there  are  10  questions  about  the  task.  For  several  questions, 
there  are  definitions  and  examples  to  clarify  the  meaning  of  the  question.  Please  read  all  defini¬ 
tions  and  examples  before  selecting  an  answer. 

NOTE:  If  the  sample  task  is  not  performed  in  the  MOS  you  are  rating,  please  use  the  substitute 
task  you  used  in  the  standard  setting  exercise. 


Task  Category:  D.  Vehicle  and  Equipment  Operations  -  Drive  or  operate  heavy  mechanical 
equipment. 


Sample  Task:  Operate  tractor/semitrailer 

For  the  Vehicle  and  Equipment  Operations  task  listed  here,  please  answer  the  following  10 
questions.  The  answers  to  these  10  questions  will  provide  information  on  the  complexity  of  the 
task  that  is  performed  by  soldiers  in  the  MOS  you  are  rating. 

Please  circle  the  most  appropriate  answer  to  each  question. 


\ 


Figure  2.5.  Instructions  and  sample  pages  from  the  Task 
Complexity  Questionnaire. 
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Sample  Task:  Operate  tractor/semitrailer 


1.  Are  job  or  memory  aids  used  by  the  soldier  in  performing  this  task? 

a.  Yes 

b.  No  {Go  to  No.  3  if  you  answer  ’No"  to  litis  question) 

Job  and  memory  aids  include  memory  joggers  learned  in  school  (e.g.,  S-A-L-U-T-E),  instruc¬ 
tions  printed  on  or  attached  to  equipment,  checklists  or  worksheets,  and  manuals  that  are  rou 
finely  used  while  performing  the  task. 

2.  How  would  you  rate  the  quality  of  the  job  or  memory  aid? 

a.  There  are  no  job  or  memory  aids  for  this  task. 

b.  Poor.  Even  with  the  job/memory  aid,  a  typical  soldier  would  need  a  great 
deal  of  additional  information. 

c.  Marginally  Good.  Even  with  the  job/memory  aid,  a  soldier  would  need 
important  additional  information. 

d.  Very  Good.  With  the  job/memory  aid,  a  soldier  would  need  only  a  little 
additional  information. 

e.  Excellent  Using  the  job/memory  aid,  a  soldier  can  do  the  entire  task  correctly 
with  no  additional  information  or  help. 

3.  Into  how  many  steps  is  this  task  typically  divided? 

a.  1  Step 

b.  2-5  Steps 

c.  6-10  Steps 

d.  More  than  10  Steps 

A  step  is  a  separate  physical  or  mental  activity  within  a  task  which  has  a  well  defined,  observe 
ble  beginning  and  ending  point. 

4.  Are  the  steps  in  this  task  required  to  be  performed  in  a  definite  sequence? 

a.  The  tasks  typically  have  only  1  step. 

b.  None  are  required  to  be  performed  in  a  particular  sequence. 

c.  Some,  but  not  all  steps  must  be  performed  in  the  correct  sequence. 

d.  All  of  the  steps  must  be  performed  in  the  correct  sequence. 

5.  Does  the  task  provide  built-in  feedback  so  that  you  can  tell  if  you  are  doing  them 
correctly? 

a.  Built-in  feedback  is  provided  for  all  steps 

b.  Built-in  feedback  is  provided  for  most  steps  ( >  50% ) 

c.  Built-in  feedback  is  provided  for  only  a  few  steps 

d.  No  Built-in  feedback  is  provided  for  any  steps. 

Examples  of  built-in  feedback  include  disassembling  equipment  where  removing  one  section 
automatically  uncovers  the  next  section;  steps  with  observable  effects  such  as  buzzers,  meter 
readings,  warning  lights;  and  operating  equipment  built  to  indicate  a  logical  progression  (for 
example,  adjusting  dials  from  left-to-right). 

Figure  2.5.  Instructions  and  sample  pages  from  the  Task 
Complexity  Questionnaire  (continued). 
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Sample  Task:  Operate  tractor/semitrailer 

6.  Does  the  task  or  parts  of  the  task  have  a  time  limit  for  its  completion? 

a.  There  are  no  time  limits 

b.  There  are  time  limits  that  are  fairly  easy  to  meet  under  test  condidons 

c.  There  are  time  limits  that  are  difficult  to  meet  under  test  conditions. 

7.  How  difficult  are  the  mental  processing  (thinking,  analyzing,  judging,  inferring,  and 
problem  solving)  requirements  of  this  task? 

a.  Almost  no  mental  processing  is  required  (physical  or  highly  repetitive  tasks) 

b.  Simple  mental  processing  is  required  (gross  comparisons,  simple  estimations  or 
calculations) 

c.  Complex  mental  processing  is  required  (choices  or  decisions  based  on  subtle  but 
discrete  clues) 

d.  Very  complex  mental  processing  is  required  (rapid  decisions,  based  on  detailed 
information,  often  under  stress) 

8.  How  many  facts,  terms,  names,  rules,  or  ideas  must  a  soldier  memorize  in  order  to 
do  this  task? 

a.  None  (or  all  are  provided  by  memory/job  aids) 

b.  A  few  (1-3) 

c.  Some  (4-8) 

d.  Very  Many  (more  than  8) 

9.  How  hard  are  the  facts  or  terms  that  must  be  remembered? 

a.  There  are  not  facts  or  terms  to  be  remembered 

b.  Not  at  all  hard  -  the  information  is  simple 

c.  Somewhat  hard  -  some  of  the  information  is  complex 

d.  Very  hard  -  the  facts,  rules,  and  terms  are  technical  or  specific  to  the  task  and 
must  be  remembered  in  exact  detail. 

10.  What  are  the  motor  control  demands  of  this  task? 

a.  None 

b.  Small,  but  noticeable  degree  of  motor  control  is  required  (such  as  driving  a  nail, 
adjusting  a  dial) 

c.  Considerable  degree  of  motor  control  is  needed  (such  as  typing,  driving  a  manual 
shift  vehicle,  or  tracking  a  moving  target) 

<L  A  very  large  degree  of  motor  control  is  needed  (such  as  repair  of  delicate  equip¬ 
ment,  or  sending  Morse  code  using  a  key) 


Figure  2.5.  Instructions  and  sample  pages  from  the  Task 
Complexity  Questionnaire  ( continued ) . 
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Chapter  3:  Analysis  of  the  Army  Task  Questionnaire 

R.  Gene  Hoffman,  Carolyn  Hill  Fotouhi, 

David  A.  Campshure  (HumRRO),  and  Wei  Jing  Chia  (AIR) 


In  Phases  I  and  II,  the  Army  Task  Questionnaire  was  shown  to 
have  good  overall  and  within  rater  group  reliability,  to 
adequately  cover  the  job  performance  domain,  and  to  effectively 
discriminate  among  military  occupational  specialties  (MOS). 

Based  on  these  results  and  the  need  to  identify  a  single 
instrument  for  describing  jobs,  the  Army  Task  Questionnaire 
emerged  as  the  prototypic  job  description  instrument.  In  this 
chapter,  we  describe  the  Army  Task  Questionnaire  and  reexamine 
the  issues  of  reliability,  MOS  coverage,  and  discrimination  among 
MOS  in  light  of  data  collected  on  the  Phase  III  MOS.  Data  from 
Phases  I  and  II  are  also  included  where  applicable. 

Instrument  Description 

The  Army  Task  Questionnaire  consists  of  96  task  categories 
that  describe  job  content  in  terms  of  the  tasks  performed.  It  is 
designed  to  be  used  to  describe  all  entry-level  MOS.  Seventeen 
task  dimensions  divide  the  96  task  categories  at  an  intermediate 
level.  Sixteen  of  these  dimensions  are  further  collapsed  into 
four  major  divisions:  (a)  maintenance,  (b)  general  operations, 

(c)  administrative,  and  (d)  combat.  The  seventeenth  dimension. 
Supervision,  is  left  separate.  The  task  categories  taxonomy  is 
shown  in  Figure  3.1,  and  a  complete  copy  of  the  questionnaire  can 
be  found  in  Appendix  A,  Attachment  1.  The  development  of  the 
Army  Task  Questionnaire  is  described  in  detail  in  Chapter  3  of 
the  Phase  I  Synthetic  Validity  report  (Chia,  Hoffman,  Campbell, 
Szenas,  &  Crafts,  1989). 

In  using  the  Army  Task  Questionnaire,  SMEs  were  asked  to 
consider  the  entire  range  of  duty  assignments  for  soldiers  with 
24  months  experience  beyond  Basic  Training  and  Advanced 
Individual  Training  (AIT)  in  their  particular  MOS.  They  were  to 
complete  the  questionnaire  from  this  frame  of  reference.  SMEs 
first  rated  how  frequently  the  tasks  in  each  category  are 
performed  by  such  soldiers  on  a  scale  from  0  (Never;  this  task  is 
not  part  of  the  job)  to  5  (Most  Often;  this  task  is  performed 
much  more  often  than  most  other  tasks).  After  providing 
Frequency  ratings  for  all  96  tasks  categories,  participants  rated 
the  Importance  and  Difficulty  of  only  those  categories  identified 
as  performed  by  soldiers  with  24  months  experience  in  the  MOS 
(i.e.,  task  categories  with  non-zero  Frequency  ratings). 
Importance  ratings  were  collected  for  three  areas  of  job 
performance:  Core  Technical,  General  Soldiering,  and  Overall 

Job.  The  Core  Technical  and  General  Soldiering  areas  correspond 
to  the  Project  A  distinction  between  the  performance  requirements 
of  an  MOS  that  are  the  central  aspects  of  the  MOS  and  that  define 
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I .  Maintenance 

A.  Mechanical  Systems  Maintenance 

1 .  Perform  operator  maintenance  checks  and  services 

2 .  Perform  operator  checks  and  services  on  weapons 

3.  Troubleshoot  mechanical  systems 

4 .  Repair  weapons 

5.  Repair  mechanical  systems 

6 .  Troubleshoot  weapons 

B.  Electrical  and  Electronic  Systems  Maintenance 

7.  Install  electronic  components 

8.  Inspect  electrical  systems 

9.  Inspect  electronic  systems 

10.  Repair  electrical  systems 

11.  Repair  electronic  components 

II.  General  Operations 

C.  Pack  and  Load 

12.  Pack  and  load  materials 

13.  Prepare  parachutes 

14.  Prepare  equipment  and  supplies  for  air  drop 

D.  Vehicle  and  Equipment  Operations 

15.  Operate  power  excavating  equipment 

16.  Operate  wheeled  vehicles 

17.  Operate  track  vehicles 

18.  Operate  boats 

19.  Operate  lifting,  loading,  and  grading  equipment 

E.  Construct/Assemble 

20.  Paint 

21.  Install  wire  and  cables 

22.  Repair  plastic  and  fiberglass 

23.  Repair  metal 

24.  Assemble  steel  structures 

25.  Install  pipe  assemblies 

26.  Construct  wooden  buildings  and  other  structures 

27.  Construct  masonry  buildings  and  structures 

F.  Technical  Procedures 

28.  Operate  gas  and  electric  powered  equipment 

29.  Select,  layout,  and  clean  medical/dental  equipment 
and  supplies 

30.  Use  audiovisual  equipment 

31.  Reproduce  printed  material 

32.  Operate  electronic  equipment 

33.  Operate  radar 

3<t.  uperate  computer  hardware 

35.  Cook 

36.  Perform  medical  laboratory  procedures 

37.  Conduct  land  surveys 

38.  Provide  medical  or  dental  treatment 

Figure  3.1.  Task  category  taxonomy. 
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G.  Make  Technical  Drawings 

39.  Sketch  maps,  overlays,  or  range  cards 

40.  Produce  technical  drawings 

41.  Draw  maps  and  overlays 

42.  Draw  illustrations 

III.  Administrative 

H.  Clerical 

43.  Type 

44.  Prepare  technical  forms  and  documents 

45.  Record,  file,  and  dispatch  information 

46.  Receive,  store,  and  issue  supplies,  equipment, 
other  materials 

I .  Communication 

47.  Use  hand  and  arm  signals 

48.  Read  technical  manuals,  field  manuals, 
regulations,  and  other  publications 

49 .  Use  maps 

50.  Send  and  receive  radio  messages 

51.  Give  oral  reports 

52.  Receive  clients,  patients,  guests 

53.  Give  directions  and  instructions 

54.  Write  documents  and  correspondence 

55.  Write  and  deliver  presentations 

56.  Interview 

57.  Provide  counseling  and  other  interpersonal 
interventions 

J.  Analyze  Information 

58.  Decode  data 

59.  Analyze  electronic  signals 

60.  Analyze  weather  conditions 

61.  Order  equipment  and  supplies 

62.  Estimate  time  and  cost  of  maintenance  operations 

63.  Plan  placement  or  use  of  tactical  equipment 

64.  Translate  foreign  languages 

65.  Analyze  intelligence  data 

K.  Applied  Math  and  Data  Processing 

66.  Control  money 

67.  Determine  firing  data  for  indirect  fire  weapons 

68.  Compute  statistics  or  other  mathematical 
calculations 

69.  Provide  programming  and  data  processing  support 
for  computer  operations 

L.  Control  Air  Traffic 

70.  Control  air  traffic 


Figure  3.1.  Task  category  taxonomy  (continued). 
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IV .  Combat 

M.  Individual  Combat 

71.  Use  hand  grenades 

72.  Protect  against  NBC  hazards 

73.  Handle  demolitions  or  mines 

74.  Engage  in  hand-to-hand  combat 

75.  Fire  individual  weapons 

76.  Control  individuals  and  crowds 

77.  Customs  and  laws  of  war 

78.  Navigate 

79.  Survive  in  the  field 

80.  Move  and  react  in  the  field 

N.  Crew-served  Weapons 

81.  Load  and  unload  field  artillery  or  tank  guns 

82.  Fire  heavy  direct  fire  weapons  (e.g.,  tank  main 
guns,  TOW  missile,  IFV  cannon) 

83.  Prepare  heavy  weapons  for  tactical  use 

84.  Place  and  camouflage  tactical  equipment  and 
materials  in  the  field 

85.  Fire  indirect  fire  weapons  (e.g.,  field  artillery) 

O.  Give  First  Aid 

86.  Give  first  aid 

P.  Identify  Targets 

87.  Detect  and  identify  targets 

Q.  Supervision  (not  included  in  any  of  the  four  major 
divisions ) 

88.  Plan  Operations 

89.  Direct/Lead  Teams 

90.  Monitor/Inspect 

9 1 .  Lead 

92.  Act  as  a  Model 

93.  Counsel 

94.  Communicate 

95.  Train 

96.  Personnel  Administration 


Figure  3.1.  Task  category  taxonomy  (continued). 


essential  character  of  the  job  versus  those  performance 
requirements  that  are  part  of  every  soldier's  role  in  the  Army 
regardless  of  MOS  (Campbell,  McHenry,  &  Wise,  1990).  Importance 
was  rated  on  a  scale  from  0  (No  Importance)  to  5  (Extremely  High 
Importance).  SMEs  provided  Difficulty  ratings  by  answering  the 
following  question:  "How  difficult  is  it  to  reach  and  maintain 
an  acceptable  level  of  proficiency  in  this  task?"  Difficulty  was 
rated  on  a  scale  from  1  (Very  Easy;  this  task  can  be  performed 
correctly  after  less  than  an  hour  of  instruction  and  performed 
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again  correctly  a  year  later  with  little  or  no  practice  in 
between)  to  5  (Very  Difficult;  this  task  can  be  performed 
correctly  after  several  weeks  of  instruction  and  performed  again 
correctly  only  if  it  is  practiced  regularly) .  The  Difficulty 
scale  was  added  to  the  Phase  III  Army  Task  Questionnaire  to  be 
explored  for  use  in  standard  setting. 

After  completing  the  Army  Task  Questionnaire,  SMEs  estimated 
the  percentage  of  the  MOS  performance  domain  that  was  covered  by 
the  questionnaire.  Specifically,  participants  answered  the 
following  question:  "What  percentage  of  the  MOS  you  are  rating 
is  covered  by  these  task  categories?"  Participants  who  indicated 
that  less  than  100%  of  the  MOS  was  covered  were  asked  to  suggest 
items  that  should  be  added. 

Editing  and  Handling  of  Missing  Data 

Each  completed  Army  Task  Questionnaire  was  screened  for 
three  kinds  of  rating  errors.  First,  missing  responses  to  the 
frequency  question  were  noted.  Second,  inappropriate  missing 
responses  to  the  Importance  and  Difficulty  questions  were  noted. 
That  is,  every  task  category  given  a  non-zero  Frequency  rating 
should  have  also  been  rated  for  Importance  in  the  three 
performance  areas  and  for  Difficulty.  The  third  error  that  was 
screened  for  was  inappropriate  responses  to  Importance  and 
Difficulty.  Tasks  with  zero  Frequency  should  not  have  been  rated 
for  Importance  or  Difficulty.  This  last  screen  checked  only  for 
inappropriate  non-zero  Importance  and  Difficulty  ratings.  Raters 
had  a  tendency  to  fill  in  zero  Importance  and  Difficulty  for 
tasks  with  zero  Frequency.  While  not  instructed  to  do  so,  these 
ratings  do  not  constitute  rating  errors  and,  in  fact,  conform  to 
our  coding  scheme  (see  Descriptive  Statistics  below). 

Each  of  the  three  errors  was  tabulated.  Twenty-two 
respondents  were  identified  with  missing  Frequency  ratings, 
inappropriate  missing  Importance  ratings,  or  inappropriate  non¬ 
zero  Importance  ratings  for  more  than  10%  of  the  task  categories. 
These  individuals  were  dropped  from  further  Army  Task 
Questionnaire  analysis.  An  additional  nine  respondents  were 
dropped  from  analyses  involving  the  Difficulty  scale  because  of 
inappropriate  Difficulty  ratings.  Table  3.1  indicates  the  MOS 
these  respondents  represented. 


« 
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Table  3.1 


Respondents  Dropped  from  Army  Task  Questionnaire  Analyses 


MOS 

Number 

12B 

4 

13B 

7 

27E 

1 

29E 

3 

31C 

1 

31D 

0 

51B 

1 

54B 

6 

55B 

2 

95B 

2 

96B 

4 

Total 

31 

Content  of  the  Army  Task  Questionnaire 

Two  items  in  the  Army  Task  Questionnaire  were  included  to 
evaluate  the  content  of  the  Army  Task  Questionnaire.  The  first 
item,  "percentage  of  MOS  covered",  asked  SMEs  to  estimate  the 
percentage  of  the  job  domain  covered  by  the  questionnaire.  The 
second  item  was  directed  toward  SMEs  who  indicated  in  the  first 
item  that  their  MOS  was  not  fully  covered.  These  SMEs  were  asked 
to  list  the  tasks  that  were  omitted.  In  previous  phases  of  the 
project,  raters  often  indicated  that  the  questionnaire  covered 
less  than  100%  of  their  MOS  and  then  were  unable  to  list  new 
tasks  that  were  not  already  included  in  one  or  more  of  the 
existing  task  categories.  Therefore,  we  have  taken  the  position 
that  responses  to  the  first  question  cannot  be  taken  at  face 
value,  but  must  be  supported  by  appropriate  responses  to  the 
second  question.  Responses  to  the  second  question  therefore 
serve  as  the  source  of  information  concerning  desirable 
modifications  to  the  questionnaire  content. 

The  content  of  the  responses  to  the  second  coverage  question 
were  thoroughly  reviewed.  Items  that  were  suggested  by  three  or 
more  SMEs  within  an  MOS  were  noted  and  checked  against  the  Army 
Task  Questionnaire.  If  the  suggested  item  is  not  included  in  the 
questionnaire,  it  is  presented  in  Table  3.2.  As  in  previous 
phases,  many  of  the  suggestions  do  not  fit  our  concept  of  a  task. 
For  example,  physical  fitness  is  cited  by  raters  from  several 
MOS,  but  it  is  outside  the  performance  domain  targeted  by  the 
Army  Task  Questionnaire.  Likewise,  safety,  per  se,  is  not  a 
task.  Rather,  safety  is  a  generic  concept  that  implies 
performing  any  task  according  to  its  acceptable  procedure.  Other 
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items  are  too  broad  to  interpret.  One  item  suggested  by  Phase 
III  participants  was  also  a  suggested  item  in  Phase  II.  That 
concerns  use  of  light  crew-served  weapons.  This  item  could  be  a 
candidate  for  inclusion  in  Dimension  N  (Crew-served  Weapons). 
Finally,  use  of  tools  and  tool  maintenance  is  consistently 
suggested.  In  revising  the  Army  Task  Questionnaire  during  pre¬ 
test  and  pilot-test  phases  of  the  project  (i.e.,  prior  to  Phase 
I ) ,  such  an  item  was  considered  to  be  implicit  in  the  various 
maintenance  task  categories  included.  Given  the  consistency  of 
the  suggestion,  tool  maintenance  may  be  another  candidate  for 
inclusion  in  the  questionnaire. 


Table  3.2 

Suggested  Content  for  Inclusion  on  the  Army  Task  Questionnaire 


MOS 

Number  of 

Times  Suggested 

Sugge=Led  Additions 

12B 

6 

Physical  Fitness 

13B 

3 

Physical  Fitness 

27E 

4 

Physical  Fitness 

2 

Safety 

8 

Hand  Tool  Maintenance 

29E 

4 

Use  of  Tools  and  Test  Equipment 

31C 

3 

Physical  Fitness 

31D 

3 

Physical  Fitness 

51B 

8 

Tool  Maintenance 

5 

Safety 

54B 

3 

Physical  Fitness 

55B 

4 

Fire  Control 

95B 

12 

Physical  Security 

3 

Light  Crew-served  Weapons 

4 

Decision  Making* 

4 

Physical  Fitness 

7 

Law  Enforcement* 

4 

Investigations* 

96B 

10 

Physical  Security 

4 

Personnel ‘Security 

“These  tasks  are  very  broad.  No  guidance  was  given  in  the 
written  comments  as  to  what  these  tasks  actually  encompass. 


Several  SMEs  were  distracted  by  the  fact  that  a  large 
proportion  of  the  task  categories  did  not  specifically  include 
examples  from  their  MOS .  Most  tasks  recommended  for  addition  to 
the  questionnaire  are  examples  of  existing  task  categories.  The 
suggested  example  tasks  fall  into  one  of  two  groups.  The  first 
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and  largest  group  consists  of  tasks  which  are  subsumed  under  an 
existing  task  category  but  are  not  specifically  listed.  For 
example,  Task  Category  48  "Read  Technical  Manuals,  Field  Manuals, 
Regulations,  and  Other  Publications"  includes  '‘reading 
instructions,  diagrams,  charts,  and  tables."  However,  because 
reading  blueprints  and  site  layouts  are  not  specified,  several 
51B  SMEs  suggested  it  be  added.  This  error  occurred  in  previous 
phases  in  spite  of  written  and  verbal  instructions  explaining 
that  the  examples  listed  under  each  task  category  are  examples 
only  and  are  not  meant  to  be  exhaustive. 

The  second  group  of  example  tasks  recommended  for  addition 
contains  examples  that  are  already  specified  on  the 
questionnaire.  For  instance,  the  suggested  additions  "install 
and  operate  antenna  systems"  are  specifically  listed  under  Task 
Category  7  "Install  Electronic  Components."  Perhaps,  the 
participants  who  listed  such  additions  read  only  the  general  task 
category  title  and  did  not  read  the  examples  included  in  that 
category.  Alternatively,  they  may  have  simply  forgotten  that  the 
task  was  covered  by  the  time  they  reached  the  end  of  the  long 
questionnaire . 

Although  some  suggested  additions  appear  to  be  task 
categories  that  have  been  omitted,  we  decided  not  to  revise  the 
Army  Task  Questionnaire  at  this  time.  A  consideration  of  the 
practical  implications  of  revising  the  questionnaire  justifies 
our  decision.  First,  there  appears  to  be  no  strong  consensus 
among  SMEs  that  particular  task  categories  have  been  omitted.  In 
other  words,  no  suggested  addition  was  overwhelmingly  stated  by  a 
large  group  of  SMEs.  Second,  psychologists'  validity  estimates 
are  available  only  for  the  existing  96  task  categories. 

Obtaining  validity  estimates  for  a  few  new  task  categories  cannot 
be  accomplished  at  this  stage  of  the  project.  Third, 
discriminant  validity  results  (presented  in  the  following 
chapter)  suggest  that  adding  a  limited  number  of  task  categories 
would  have  no  impact  on  synthetic  validity  equations. 

Descriptive  Statistics 

For  each  of  the  11  Phase  III  MOS,  means,  standard 
deviations,  and  sample  sizes  for  Frequency,  Core  Technical 
Importance,  General  Soldiering  Importance,  Overall  Job 
Importance,  and  Difficulty  ratings  were  calculated.  Table  3.3 
presents  the  means  of  the  Core  Technical  Importance  ratings  for 
each  of  the  Phase  III  MOS  for  the  Army  Task  Questionnaire.  We 
frequently  refer  to  these  mean  ratings  as  "profiles."  Mean  Core 
Technical  Importance  ratings  of  3.5  or  higher  are  highlighted. 
Appendix  B  presents  the  complete  results  for  means  of  the 
different  types  of  ratings  for  each  MOS  in  order  of  decreasing 
Frequency. 

When  the  questionnaire  was  administered,  participants 
provided  Frequency  ratings  for  all  items  and  Importance  and 
Difficulty  ratings  for  only  those  items  with  non-zero  Frequency 
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Table  3.3  (continued) 


ratings.  In  calculating  descriptive  statistics.  Importance  and 
Difficulty  values  for  task  categories  with  zero  Frequency  ratings 
were  set  to  zero  rather  than  treated  as  missing.  Given  the 
rating  procedure  employed,  a  zero  Frequency  rating  implies  zero 
Importance  and  Difficulty  ratings,  and  these  zero  ratings  should 
bo  considered  when  examining  the  mean  importance  of  items. 

Reliability  Analyses 


Approach 

Army  Task  Questionnaire  ratings  were  obtained  from  raters  of 
different  ranks  (NCO,  Officer,  Civilian)  representing  different 
commands  (TRADOC  MOS  proponent  agencies  and  operational  Table  of 
Organization  and  Equipment  [TO&E]  units).  Reliability  questions 
for  the  Army  Task  Questionnaire  revolve  around  two  overlapping 
issues:  (a)  Do  raters  within  the  same  rater  group  agree  with 

each  other?  and  (b)  Do  ratings  from  the  different  rater  groups 
agree?  An  alternative  way  to  phrase  the  second  issue  is:  Do 
raters  combined  from  all  rater  groups  agree  with  each  other? 

These  questions  were  addressed  separately  for  each  MOS,  and  the 
results  were  summarized  across  MOS. 

For  the  Phase  III  Army  Task  Questionnaire  data,  the 
reliability  estimation  procedure  paralleled  the  procedure  used  in 
Phases  I  and  II  for  the  Attribute  Questionnaire.  The  procedure 
is  a  simplification  of  the  fully  developed  generalizability  model 
used  for  combined  rater  groups  in  Phase  I  and  II  analyses  o I  the 
Task  and  Activity  Questionnaires .  Instead  of  explicitly  teasing 
out  variance  components  for  rank  and  command,  only  task  category 
and  rater  components  were  estimated.  These  were  then 
appropriately  combined  into  a  reliability  coefficient  that  treats 
task  categories  (the  objects  of  measurement)  as  fixed,  raters  as 
random,  and  includes  both  rater  variance  and  rater-by-task 
category  variance  as  measurement  error.  Formulas  may  be  found  in 
Brennan  (1983).  In  the  Phase  I  and  II  reports,  the  explicit 
interpretation  of  the  rank  and  command  variance  components  tended 
to  be  ignored.  Instead,  combined  group  reliabilities  were 
compared  to  within-group  reliabilities  to  determine  agreement 
across  rater  groups.  Thus,  if  rater  groups  disagreed  with  each 
other,  then  combined  group  reliabilities  would  be  noticeably 
lower  than  within-group  reliabilities . 

Because  data  for  the  full  generalizability  model  is  neither 
fully  complete  nor  balanced,  previous  analyses  required 
computation  by  an  expensive  variance  components  estimate 
procedure  on  a  mainframe  computer.  The  data  for  the  simpler 
procedure  of  estimating  only  task  category  and  rater  variance  is 
complete.  It  can  be  executed  on  a  personal  computer  and  is 
therefore  much  more  economical.  It  is  supplemented  with 
supporting  analyses  that  make  the  amount  of  information  obtained 
concerning  rater  agreement  as  comprehensive  as  that  reported  in 
Phases  I  and  II. 


3-13 


Results 


Table  3.4  presents  the  single-rater  reliability  estimates 
for  each  rater  group  and  for  combined  rater  groups  for  each  MOS. 
Single-rater  reliability  estimates  are  appropriate  for  comparing 
differences  among  rater  groups  and  MOS,  and  they  provide  the 
basic  data  needed  to  make  projections  (i.e.,  via  Spearman-Brown) 
concerning  the  number  of  raters  needed  in  future  uses  of  the  Army 
Task  Questionnaire.  Table  3.5  summarizes  the  single-rater 
reliabilities  by  presenting  mean  reliabilities  across  the  MOS  for 
each  rating  scale  and  across  all  rating  scales.  Reliabilities 
were  computed  only  for  rater  groups  with  four  or  more  raters. 

Because  mean  profiles,  presented  in  Table  3.3  and  Appendix 
B,  were  calculated  using  all  raters  combined.  Table  3.4  also 
presents  reliability  estimates  based  on  all  raters  within  each 
MOS .  These  were  calculated  by  stepping  up  the  single-rater 
reliability  for  all  raters  combined  (the  next  to  last  column  in 
Table  3.4)  to  estimate  total  reliability  with  all  raters.  For 
example,  77  is  the  total  number  of  raters  for  12B.  Therefore,  77 
was  used  to  estimate  the  total  reliability  for  all  rating  scales 
(Frequency,  Core  Technical  Importance,  General  Soldiering 
Importance,  Overall  Job  Importance,  and  Difficulty)  for  12B. 

Differences  among  reliability  estimates.  In  general,  the 
single-rater  reliabilities  are  high  for  all  scales  and  for  all 
rater  groups.  Because  of  the  large  number  of  raters  in  the 
sample,  many  of  the  reliabilities  for  the  mean  ratings  approach 
.99.  Based  on  the  single-rater  reliability  estimates,  group 
reliabilities  would  exceed  .90  for  all  of  the  Frequency  and 
Importance  ratings  with  as  few  as  10  raters.  However,  there  are 
a  number  of  comparisons  to  consider.  These  include  differences 
between  the  rater  groups  and  differences  among  the  reliabilities 
of  the  scales.  For  the  rater  groups,  differences  between 
Officers  and  NCOs  (across  commands)  and  differences  between 
commands  (across  Officers  and  NCOs)  are  of  most  interest.  Also, 
civilian  reliabilities  compared  to  the  soldiers'  (Officers  and 
NCOs  combined)  are  of  interest.  For  the  scales,  the  differences 
of  greatest  interest  are  those  between  the  Frequency  scale  and 
the  Importance  scales,  and  those  between  the  Difficulty  scale  and 
the  Frequency  and  Importance  scales . 

Using  the  single-rater  reliabilities  from  Table  3.4  with 
appropriate  coding  for  rank,  command,  and  scale,  a  series  of 
orthogonal,  planned  comparisons  were  conducted  to  test  the 
statistical  significance  among  differences  in  the  reliability 
estimates.  Based  on  reliabilities  computed  on  all  rater  groups 
(the  next  to  last  column  in  Table  3.4),  Frequency  and  Importance 
reliabilities  are  not  detectably  different  (Fli50  =  2.16,  ns),  but 
Difficulty  reliabilities  are  lower  than  those  for  Frequency  and 
Importance  (F1)50  =  39.49  ,  £  <  .01).  Rater  group  comparisons 
showed  no  rank  differences  (£1,142  =  1.10,  ns).  Finally,  FORSCOM 
raters  agreed  more  among  themselves  than  TRADOC  raters  (Flrl03  = 
6.12,  £  <  .02).  Even  for  the  statistically  significant 
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Table  3.5 


Army  Task  Questionnaire  Mean  Reliabilities  by  Rank  and  Command 


TRADOC 

FORSCOM 

COMBINED 

NCO 

OFF 

CIV 

TOT 

NCO 

OFF 

TOT 

NCO 

OFF 

CIV 

TOT 

N* 

11 

7 

4 

11 

10 

9 

10 

11 

9 

4 

11 

Frequency 

.55 

.55 

.49 

.54 

.57 

.58 

.56 

.55 

.57 

.48 

.55 

Na 

11 

7 

4 

11 

10 

9 

10 

11 

9 

4 

11 

CTI 

.52 

.49 

.44 

.49 

.52 

.54 

.52 

.52 

.52 

.43 

.50 

GSI 

.51 

.53 

.43 

.49 

.54 

.56 

.54 

.52 

.55 

.42 

.52 

OJI 

.53 

.52 

.43 

.51 

.56 

.57 

.56 

.54 

.55 

.42 

.53 

Na 

11 

7 

4 

11 

10 

9 

10 

11 

9 

4 

11 

Difficulty 

.39 

.38 

.38 

.38 

.43 

.45 

.43 

.41 

.43 

.37 

.41 

Na 

55 

35 

20 

55 

50 

45 

50 

55 

45 

20 

55 

All  Scales 

.50 

.50 

.43 

.48 

.52 

.54 

.52 

.51 

.52 

.42 

.50 

Na 

Frequency  & 

44 

28 

16 

44 

40 

36 

40 

44 

36 

16 

44 

Importance 

.53 

.52 

.45 

.51 

.55 

.56 

.54 

.53 

.55 

.44 

.52 

Note .  CTI  =  Core  Technical  Importance,  GSI  =  General  Soldiering 
Importance,  OJI  =  Overall  Job  Importance. 

‘Number  of  reliability  coefficients  included  in  the  analyses. 


differences,  the  magnitude  of  the  reliability  differences  among 
scales  or  among  rater  groups  is  not  large.  In  addition, 
reliability  information  is  not  a  sufficient  determining  factor  in 
selecting  scales  or  rater  groups  for  use  in  deve’^ing  synthetic 
validation.  There  are  other  issues  to  consider. 

Agreement  among  rater  groups.  Table  3.5  also  provides  some 
evidence  for  the  agreement  among  rater  groups.  That  is,  if  the 
different  groups  are  providing  different  mean  task  category 
ratings,  then  reliabilities  estimated  across  groups  will  be  lower 
than  the  separate  reliabilities  estimated  for  each  rater  group. 
For  example,  NCO  and  Officer  reliabilities  average  slightly 
higher  than  total  group  reliabilities.  In  theory,  a  more 
powerful  way  to  test  group  differences  in  the  task  category 
ratings  is  by  repeated  measures  ANOVA  on  the  raters  with  their 
task  category  ratings  as  the  repeated  "trials"  and  rank  and 
command  as  grouping  factors.  The  number  of  trials  (96  task 
category  ratings  per  scale),  however,  is  excessive.  An 
alternative  approach,  with  two  parts,  was  used.  Each  part  was 
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relevant  tasks.  This  difference  may  not  be  trivial.  If  NCOs, 
compared  to  Officers,  are  identifying  a  larger  set  of  task 
categories  as  relevant  to  their  MOS,  then  there  is  a  possibility 
that  the  NCO  profiles  will  lead  to  the  selection  of  a  larger  set 
of  predictors  for  the  MOS.  Another  difference  is  that  civilians 
provide  lower  average  ratings  than  either  NCOs  or  Officers  on  all 
scales  except  Difficulty.  Also,  command  differences  occur  for 
Frequency,  Overall  Job  Importance,  and  Difficulty.  In  each  case 
TRADOC  ratings  average  higher  than  FORSCOM  ratings . 

Examination  of  proportions  of  non-zero  ratings  was  conducted 
by  converting  rating  group  mean  profiles  to  profiles  of  Is  and 
Os.  Group  means  of  0.0  for  task  categories  occur  only  when  all 
raters  within  a  group  rate  the  item  as  zero .  Rather  than 
strictly  using  these  means,  task  categories  with  mean  ratings  of 
less  than  1.0  were  recoded  as  zero.  Task  categories  with  mean 
ratings  equal  to  or  greater  than  1 . 0  were  recoded  as  1 .  The 
selection  of  1  as  the  cutoff  point  means  either  that  all  raters 
agree  that  the  task  category  has  some  frequency  or  importance,  or 
that  a  sufficient  number  of  raters  believe  a  task  category  is  of 
more  than  minor  importance  (ratings  of  2.0  or  more)  to  offset  the 
opinions  of  those  that  believe  the  category  is  not  part  of  the 
MOS.  Tables  3.9  and  3.10  present  analyses  using  only  NCOs  and 
Officers . 


Table  3 . 6 

Mean  Level  of  Army  Task  Questionnaire  Ratings  for  Five  Rater 
Groups 


Rater 

Group 

NCO 

Officer 

Civilian 

Scale 

TRADOC 

FORSCOM 

TRADOC 

FORSCOM 

TRADOC 

N  of  raters 

672 

672 

672 

672 

288 

Frequency 

1.510 

1.470 

1.466 

1.420 

1.175 

Core  Tech  Imp 

Gen  Soldier  Imp 
Overall  Job  Imp 

1.676 

1.703 

1.811 

1.653 

1.656 

1.740 

1.571 

1.604 

1.721 

1.537 

1.586 

1.695 

1.295 

1.310 

1.452 

Difficulty 

1.621 

1.480 

1.764 

1.609 

1.482 
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Table  3.7 


Rank  and  Command  Effects  on  Army  Task  Questionnaire  Ratings 


Dependent 

Variable 

Within- 

Subjects 

SS 

df 

MS 

F 

L 

Frequency 

Rank  (R) 
Error 

1.498 

100.016 

1 

671 

1.498 

0.149 

10.053 

0.002 

Command  (C) 
Error 

1.231 

131.572 

1 

671 

1.231 

0.196 

6.280 

0.012 

R  X  C 

Error 

0.008 

71.967 

1 

671 

0.008 

0.107 

0.072 

0.788 

Core  Tech  Imp 

Rank  (R) 
Error 

8.269 

176.314 

1 

671 

8.269 

0.263 

31.468 

0.000 

Command  (C) 
Error 

0.556 

177.078 

1 

671 

0.556 

0.264 

2.108 

0.147 

R  X  C 

Error 

0.017 

82.628 

1 

671 

0.017 

0.123 

0.141 

0.707 

Gen  Soldier 

Imp 

Rank  (R) 
Error 

4.795 

121.656 

1 

671 

4.795 

0.181 

26.449 

0.000 

Command  (C) 
Error 

0.710 

136.680 

1 

671 

0.710 

0.204 

3.486 

0.062 

R  X  C 

Error 

0.135 

88.455 

1 

671 

0.135 

0.132 

1.024 

0.312 

Overall  Job 

Imp 

Rank  (R) 
Error 

3.049 

126.166 

1 

671 

3.049 

0.188 

16.214 

0.000 

Command  (C) 
Error 

1.582 

143.506 

1 

671 

1.582 

0.214 

7.397 

0.007 

R  X  C 

Error 

0.348 

89.069 

1 

671 

0.348 

0.133 

2.625 

0.106 

Difficulty 

Rank  (R) 
Error 

12.475 

126.011 

1 

671 

12.475 

0.188 

66.428 

0.000 

Command  (C) 
Error 

14.719 

138.612 

1 

671 

14.719 

0.207 

71.250 

0.000 

R  X  C 

Error 

0.031 

78.992 

1 

671 

0.031 

0.118 

0.267 

0.606 
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Table  3.8 


Civilian  Versus  Soldier  Effects  on  Army  Task  Questionnaire 
Ratings 


Dependent 

Variable 

Within- 
Sub jects 
Effects 

S3 

df 

MS 

F 

£ 

Frequency 

Civilian 

Error 

11.644 

81.432 

1 

287 

11.644 

0.284 

41.037 

0.000 

Core  Tech  Imp 

Civilian 

Error 

12.600 

104.770 

1 

287 

12.600 

0.365 

34.517 

0.000 

Gen  Soldier  Imp 

Civilian 

Error 

23.586 

104.590 

1 

287 

23.586 

0.364 

64.722 

0.000 

Overall  Job  Imp 

Civilian 

Error 

14.562 

119.213 

1 

287 

14.562 

0.415 

35.057 

0.000 

Difficulty 

Civilian 

Error 

1.430 

119.251 

1 

287 

1.430 

0.416 

3.441 

0.065 

Table  3 . 9 

Proportion  of  Non-zero  Rated 
Groups 

Task  Categories 

for  Four  Rater 

Rater 

Group 

NCO 

Officer 

Scale 

TRADOC 

FORSCOM 

TRADOC 

FORSCOM 

N  of  raters 

672 

672 

672 

672 

Frequency 

0.542 

0.516 

0.554 

0.527 

Core  Technical  Importance 

0.579 

0.554 

0.568 

0.557 

General  Soldier  Importance 

0.579 

0.551 

0.576 

0.561 

Overall  Job  Importance 

0.594 

0.564 

0.612 

0.585 

Difficulty 

0.635 

0.577 

0.680 

0.631 
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Table  3.10 


Rank  and  Command  Effects  on  Proportion  of  Non-zero  Rated  Task 
Categories 


Dependent 

Variable 

Within- 
Sub jects 

SS 

df 

MS 

F 

Frequency 

Rank  (R) 
Error 

0.084 

38.166 

1 

671 

0.084 

0.057 

1.472 

0.226 

Command  (C) 
Error 

0.456 

40.794 

1 

671 

0.456 

0.061 

7.496 

0.006 

R  X  C 

Error 

0.000 

24.250 

1 

671 

0.000 

0.036 

0.010 

0.919 

Core  Tech  Imp 

Rank  (R) 
Error 

0.009 

44.241 

1 

671 

0.009 

0.066 

0.141 

0.707 

Command  (C) 
Error 

0.233 

39.017 

1 

671 

0.233 

0.058 

3.999 

0.046 

R  X  C 

Error 

0.030 

28.220 

1 

671 

0.030 

0.042 

0.717 

0.398 

Gen  Soldier 

Imp 

Rank  (R) 
Error 

0.009 

42.241 

1 

671 

0.009 

0.063 

0.148 

0.701 

Command  (C) 
Error 

0.313 

34.937 

1 

671 

0.313 

0.052 

6.009 

0.014 

R  X  C 

Error 

0.030 

29.220 

1 

671 

0.030 

0.044 

0.692 

0.406 

Overall  Job 

Imp 

Rank  (R) 
Error 

0.251 

37.749 

1 

671 

0.251 

0.056 

4.470 

0.035 

Command  (C) 
Error 

0.537 

38.463 

1 

671 

0.537 

0.057 

9.372 

0.002 

R  X  C 

Error 

0.001 

28.999 

1 

671 

0.001 

0.043 

0.034 

0.853 

Difficulty 

Rank  (R) 
Error 

1.621 

47.379 

1 

671 

1.621 

0.071 

22.950 

0.000 

Command  (C) 
Error 

1.929 

43.071 

1 

671 

1.929 

0.064 

30.045 

0.000 

R  X  C 

Error 

0.013 

29.987 

1 

671 

0.013 

0.045 

0.300 

0.584 

The  comparisons  of  proportions  of  task  categories  with  non¬ 
zero  ratings  for  Frequency,  Core  Technical  Importance,  and 
General  Soldiering  Importance  are  not  statistically  different  for 
NCOS  and  Officers.  This  suggests  that  the  difference  in  mean 
levels  of  these  rater  group  profiles  are  due  to  higher  ratings 
given  to  relevant  (non-zero)  task  categories  by  NCOs  rather  than 
NCOs  identifying  a  greater  number  of  relevant  task  categories. 

On  the  other  hand,  rank  differences  are  significant  for 
proportions  of  task  categories  given  non-zero  Overall  Job 
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Importance  and  Difficulty  ratings.  This  suggest  that  the  mean 
differences  in  profiles  may  be  due,  at  least  in  part,  to  more 
task  categories  being  rated  in  those  areas . 

Other  subtle  differences  may  be  disentangled  from  these 
results.  However,  the  sizes  of  the  mean  differences  that  are 
statistically  significant  are  not  particularly  large  (e.g., 
command  mean  differences  on  Frequency  are  .05).  Furthermore,  the 
statistical  posture  of  these  analyses  is  directed  toward 
highlighting  differences  rather  than  similarities.  The  final 
analysis  of  the  mean  profiles  highlights  the  similarities  among 
the  rater  groups . 

Table  3.11  presents  the  correlations  between  the  rater  group 
profiles.  Excluding  the  civilians,  these  correlations  average  in 
the  upper  .80s  to  lower  .90s.  While  the  correlations  are  high, 
they  are  not  perfect.  These  correlations,  viewed  in  light  of  the 
profile  level  analyses,  suggest  that  every  group  profile  may  be 
considered  highly  similar  to  the  other  group  profiles,  but  each 
also  offers  some  unique  variance  in  task  category  ratings. 

Recall  that  these  group  profiles  are  highly  reliable  so  it  is 
difficult  to  dismiss  the  unique  information  as  unreliable  noise. 

Conclusions  Concerning  Army  Task  Questionnaire  Reliability 

As  in  Phases  I  and  II,  the  Army  Task  Questionnaire  continues 
to  produce  high  reliability  both  within  and  across  the  different 
rater  groups.  Phase  III  analyses  extended  the  previous  analyses 
to  more  thoroughly  investigate  the  differences  between  rater 
groups.  Small,  but  statistically  identifiable,  differences  in 
ratings  from  the  different  rater  groups  were  found.  It  has  been 
the  opinion  throughout  this  project  that  psychometrics  alone 
cannot  determine  who  to  use  for  MOS  raters .  Which  group  or 
combination  of  groups  represents  or  should  represent  the  "true" 
MOS  profile  is  a  political  question.  The  data  do  suggest  that 
careful  consideration  should  be  given  before  using  civilians; 
their  perspective  is  the  most  divergent,  and  the  constituency 
they  represent  may  be  the  least  certain.  For  some  MOS,  the 
identification  of  appropriate  Officers  was  a  problem,  suggesting 
the  Officer  group  should  not  be  automatically  included  in 
obtaining  MOS  data.  On  the  other  hand,  there  should  be  no 
question  about  the  inclusion  of  NCOS.  They  represent  the  senior 
leadership  in  the  MOS,  and  they  are  actively  involved  with  MOS 
development  at  the  schools.  The  psychometric  differences  for 
command,  coupled  with  the  political  perspective,  suggest  that 
ratings  obtained  from  TRADOC  and  from  a  sample  of  operational 
units  may  be  the  most  acceptable.  Considering  the  Phase  II 
synthetic  equation  discriminate  validities  and  foreshadowing  the 
Phase  III  results  presented  in  the  next  chapter,  the  sample  of 
raters  should  be  large  enough  to  provide  as  much  of  the 
discrimination  potential  available  from  the  Army  Task 
Questionnaire  as  reasonably  feasible.  For  example,  a  sample  of 
20  TRADOC  raters  and  40  raters  from  operational  units  from  at 
least  two  sites  would  give  a  total  of  60  raters.  Sixty  raters 
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Table  3.11 


Correlations  Among  Rater  Group's  Army  Task  Questionnaire  Profile 


TRADOC 

NCO 

FORSCOM 

NCO 

TRADOC 

Officers 

FORSCOM 

Officers 

Matrix  mean  r 
w/civilians 
(w/o  civilians) 

Frequency  Scale: 

FORSCOM  NCOS 

0.925 

1 

| 

TRADOC  Officers 

0.905 

0.903 

!  .89 

FORSCOM  Officers 

0.895 

0.945 

0.900 

!  (.93) 

TRADOC  Civilians 

0.854 

0.854 

0.861 

0.840 

1 

1 

Core  Technical  Importance 

Scale: 

FORSCOM  NCOS 

0.928 

1 

| 

TRADOC  Officers 

0.870 

0.874 

|  .87 

FORSCOM  Officers 

0.876 

0.935 

0.885 

!  (.89) 

TRADOC  Civilians 

0.839 

0.841 

0.853 

0.834 

1 

1 

General  Soldiering 

Importance  Scale 

• 

• 

FORSCOM  NCOS 

0.926 

1 

| 

TRADOC  Officers 

0.892 

0.920 

J  .84 

FORSCOM  Officers 

0.901 

0.948 

0.916 

!  (-92) 

TRADOC  Civilians 

0.867 

0.865 

0.835 

0.836 

i 

i 

Overall  Job  Importance  Scale: 

FORSCOM  NCOS 

0.931 

1 

i 

TRADOC  Officers 

0.897 

0.920 

|  .89 

FORSCOM  Officers 

0.900 

0.950 

0.909 

i  (-92) 

TRADOC  Civilians 

0.863 

0.853 

0.826 

0.837 

1 

1 

Difficulty  Scale: 

FORSCOM  NCOS 

0.895 

1 

| 

TRADOC  Officers 

0.851 

0.878 

j  .85 

FORSCOM  Officers 

0.842 

0.921 

0.872 

!  (.88) 

TRADOC  Civilians 

0.798 

0.839 

0.780 

0.809 

i 

i 

N  of  raters : 

TRADOC  NCOS 

1056 

FORSCOM  NCOS 

960 

960 

TRADOC  Officers 

672 

672 

672 

FORSCOM  Officers 

864 

864 

672 

864 

TRADOC  Civilians 

384 

384 

288 

384 

384 
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would  yield  mean  Core  Technical  Importance  ratings  with  a 
reliability  of  .98  given  the  single-rater  reliability  estimates 
of  .50  in  Table  3.5. 

Army  Task  Questionnaire  Scale  Validities 

Phase  II  showed  that  the  Army  Task  Questionnaire  scales  are 
highly  redundant,  but  that  the  scales  do  show  some  degree  of 
discrimination  among  the  MOS .  These  conclusions  stemmed  from 
analyses  of  a  multitrait-multimethod  correlation  matrix  computed 
from  task  category  rating  profiles  for  each  MOS  on  each  scale. 
That  is,  the  10  Phase  I  and  II  MOS  represented  different  traits, 
and  the  four  rating  scales  represented  different  assessment 
methods.  The  96  Army  Task  Questionnaire  items  represented  cases 
from  which  the  multitrait-multimethod  matrix  was  constructed. 

For  Phase  III,  this  procedure  was  repeated  using  the  11  Phase  III 
MOS  as  traits  and  the  five  scales  as  methods.  Appendix  C 
presents  the  full  m”ltitrait-multimethod  matrix.  Summary  results 
are  presented  below. 

Discrimination  Among  MOS 

Discriminant  validities  are  the  correlations  across 
questionnaire  task  categories  between  different  MOS  assessed  by 
the  same  rating  scale.  Table  3.12  shows  the  average  (r  to  z)  of 
these  correlations  for  each  rating  scale.  Phase  I  and  II  results 
are  displayed  for  comparison. 

Phase  III  results  are  congruent  with  previous  results.  The 
discriminant  validity  correlations  replicated  on  11  different  MOS 
are  virtually  identical  to  those  based  on  the  10  Phase  I  and  II 
MOS.  As  expected,  the  Core  Technical  Importance  scale,  having 


Table  3.12 

Mean  Discriminant  Validity  (Same  Scale,  Different  MOS) 
Correlations 


Scale 

Phase  III  MOS 

Phase  I  and 

II  MOS 

r 

l-r2 

r 

l-r2 

Frequency 

.63 

.60 

.63 

.60 

Core  Technical  Importance 

.58 

.66 

.58 

.66 

General  Soldier  Importance 

.88 

.23 

.86 

.26 

Overall  Job  Importance 

.75 

.43 

.75 

.44 

Difficulty 

.68 

.54 

- 

- 
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the  lowest  average  discriminant  correlations,  shows  the  greatest 
discrimination  among  the  MOS,  and  the  General  Soldiering 
Importance  scale  shows  the  least  discrimination.  It  is 
interesting  that  the  two  scales  that  address  the  relevance  of  the 
task  categories  to  the  whole  job  (i.e.,  the  Frequency  and  Overall 
Job  Importance  scales)  are  quite  different  in  the  extent  to  which 
they  discriminate  among  the  MOS. 

While  these  correlations  look  high,  suggesting  that  all  MOS 
appear  rather  similar  on  the  Army  Task  Questionnaire,  there  is  an 
alternative  way  to  present  the  data.  For  example,  the  Core 
Technical  average  discriminant  validity  of  .58  may  be  interpreted 
to  mean  that  any  MOS  shares,  on  the  average,  34%  of  its  variance 
in  task  category  ratings  with  any  other  MOS.  That  leaves  66%  of 
the  MOS  variance  in  task  category  ratings  as  unique.  Again,  the 
task  category  ratings  are  highly  reliable,  suggesting  that  the 
66%  unique  variation  in  task  category  ratings  may  be  meaningful. 
On  the  other  hand,  the  discriminant  validity  correlations  for 
General  Soldiering  Importance  ratings  suggest  that  no  more  than 
23%  of  the  variation  in  task  category  ratings  is  MOS-specif ic . 

Convergence  of  Rating  Scales 

Convergent  validities  are  the  correlations  across  task 
category  means  of  the  different  rating  scales  within  each  MOS. 
These  are  presented  in  Table  3.13  for  Phase  III  MOS  as  well  as 
for  the  10  MOS  analyzed  in  Phases  I  and  II.  The  pattern  of 
results  from  the  Phase  III  MOS  again  parallel  previous  results. 
The  Frequency,  Core  Technical  Importance,  and  Overall  Job 
Importance  scales  are  essentially  redundant.  The  General 
Soldiering  Importance  scale  shows  somewhat  less  redundancy  with 
the  other  scales. 

A  second  source  of  information  concerning  the  redundancy  of 
the  rating  scales  comes  from  the  average  off-diagonal  (different 
scale,  different  MOS)  correlations  presented  in  Table  3.14.  As 
noted  in  the  Phase  II  report,  the  normal  expectation  is  that 
these  correlations  will  not  be  high.  However,  they  are  in  the 
same  range  as  the  discriminant  validity  correlations  presented  in 
Table  3.12.  This  again  suggests  that  the  different  scales  are 
providing  the  same  information  about  the  MOS. 

An  issue  concerning  the  Army  Task  Questionnaire  rating 
scales  is  that  their  correlations  are  inflated  by  the  multiple 
zero  ratings  that  MOS  are  expected  to  have  in  common.  The 
convergent  and  discriminant  validity  correlations  for  the  11 
Phase  III  MOS  were  repeated  with  the  previously  described  index 
of  Is  and  Os  presenting  a  dichotomy  of  relevant  versus  non- 
relevant  tasks.  Tables  3.15  and  3.16  present  these  recalculated 
validities  along  with  the  original  validities  presented  in  Tables 
3.12  and  3.13.  With  +-he  exception  +_>°  discriminant  validities 
for  the  General  Soldiering  Importance  scale,  the  convergent  and 
discriminant  validities  for  the  recoded  task  category  profiles 
show  little,  if  any,  difference  from  those  for  the  task  category 
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means.  The  Phase  II  report  suggested  that  it  may  not  matter  so 
much  how  frequently  performed  or  how  important  a  task  category 
is,  but  simply  that  it  is  relevant  to  the  job.  That  conclusion 
is  reinforced  with  the  present  observations. 


Table  3.13 

Mean  Convergent  Validities  (Different  Scale,  Same  MOS) 
Correlations 


Freq  Core  Tech  Imp  Gen  Sold  Imp  Over  Job  Imp 


Core  Technical  Importance 


Phase  III 

.99 

Phases  I  &  II 

.99 

General  Soldier  Importance 
Phase  III 

.91 

.90 

Phases  I  &  II 

.91 

.89 

Overall  Job  Importance 

Phase  III 

.97 

.97 

.97 

Phases  I  &  II 

.97 

.96 

.98 

Difficulty 

Phase  III 

.93 

.94 

.92 

Table  3.14 

Mean  Off-Diagonal  (Different  Scale,  Different  MOS)  Correlations 


Freq  Core  Tech  Imp  Gen  Sold  Imp  Over  Job  Imp 


Core  Technical  Importance 


Phase  III 

.59 

Phase  II 

.60 

General  Soldier  Importance 

Phase  III 

.73 

.69 

Phase  II 

.70 

.68 

Overall  Job  Importance 

Phase  III 

.67 

.65 

.81 

Phase  II 

.68 

.65 

.80 

Dif  f ;  •'  ”lt> 

Phase  III 

.61 

.59 

.74 
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Table  3.15 

Mean  Convergent  Validities  (Different  Scale,  Same  MOS) 
Correlations  Based  on  Relevant  (1)  versus  Non-relevant  (0) 
Indices  and  Original  Mean  Ratings 


Freq 

Core  Tech  Imp 

Gen  Sold  Imp 

Over  Job  Imp 

Core  Technical  Importance 

1/0  Scoring 

.96 

Task  Means 

.98 

General  Soldier  Importance 

1/0  Scoring 

.90 

.89 

Task  Means 

.89 

.87 

Overall  Job  Importance 

1/0  Scoring 

.89 

.96 

.94 

Task  Means 

.97 

.96 

.97 

Difficulty 

1/0  Scoring 

.84 

.88 

.94 

.93 

Task  Means 

.93 

.94 

.91 

.96 

Table  3.16 

Mean  Discriminant  Validity  (Same  Scale,  Different  MOS) 


Correlations  Based  on  Relevant  (1)  versus 
Indices  and  Original  Meaning  Ratings 

Non-relevant 

(0) 

1/0 

Scorinq 

Task  Mean 

Ratinqs 

Scale 

r 

i  2 

1-r 

r 

■,  2 

1-r 

Frequency 

.59 

.65 

.63 

.60 

Core  Technical  Importance 

.56 

.69 

.58 

.66 

General  Soldier  Importance 

.69 

.52 

.88 

.23 

Overall  Job  Importance 

.66 

.59 

.75 

.43 

Difficulty 

.64 

.59 

.68 

.54 
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Conclusions 


Three  important  conclusions  may  be  reached  regarding  the 
Army  Task  Questionnaire.  First,  it  provides  highly  reliable 
descriptions  of  Army  MOS.  In  fact,  using  the  worst  case 
reliability  (.40  single-rater  reliability  for  55B  Core  Technical 
Importance),  only  14  raters  would  be  needed  to  boost  the 
reliability  of  mean  ratings  to  .90.  Thus,  we  reiterate  that 
political  concerns  regarding  the  extent  to  which  raters  represent 
all  of  the  important  constituencies  should  drive  decisions  about 
the  number  of  raters  to  use  in  future  synthetic  validity  efforts. 
Second,  the  Army  Task  Questionnaire,  particularly  Core  Technical 
Importance,  differentiates  the  MOS.  Third,  Frequency,  Core 
Technical  Importance,  and  Overall  Job  Importance  are  highly 
redundant.  They  basically  provide  information  about  whether  or 
not  a  task  category  is  relevant  to  the  MOS.  General  Soldiering 
Importance  is  the  least  redundant  of  the  scales  and,  reflecting 
the  structure  of  a  common  set  of  tasks  for  all  Army  MOS,  shows 
the  least  discrimination  among  the  MOS. 

Table  3.17  presents  summary  descriptions  of  all  MOS  from 
Phases  I,  II,  and  III  of  the  Synthetic  Validity  project.  The 
table  indicates  which  Task  Dimensions  from  the  Army  Task 
Questionnaire  are  of  major  relevance  to  the  Core  Technical 
component  for  each  MOS.  A  dimension  was  defined  as  relevant  for 
an  MOS  if  at  least  one  of  the  dimension's  constituent  task 
categories  had  a  Core  Technical  Importance  value  of  3.5  or 
greater.  Using  the  relevance  definition,  MOS  were  sorted  for 
visual  presentation  by  a  matrix  cluster  routine  (Wilkinson,  1988) 
that  simultaneously  orders  rows  (MOS)  and  columns  (Task 
Dimensions).  The  numbers  inside  the  matrix  cells  indicate  the 
number  of  task  categories  with  a  mean  Core  Technical  Importance 
value  greater  than  or  equal  to  3.5.  Clustering  of  MOS  is  pursued 
more  closely  in  the  following  chapter. 

The  data  in  Table  3.17  render  no  significant  deviations  from 
what  one  would  intuitively  expect.  Because  communication  is 
important  for  performance  in  any  job,  military  or  civilian,  it  is 
not  surprising  that  Communication  is  a  relevant  dimension  for  all 
MOS  studied.  Also,  it  is  apparent  that  while  all  MOS  endorse  the 
Communication  Dimension  as  important,  they  differ  in  terms  of  the 
number  of  relevant  task  categories  within  the  dimension. 

Returning  to  Table  3.3,  the  highlighted  task  categories  within 
Communication  show  a  pattern  consistent  with  the  relative 
characteristics  of  the  MOS. 

As  expected,  Dimension  M  (Combat)  is  relevant  for  combat 
MOS,  Dimension  B  (Electrical  and  Electronic  Maintenance)  is 
relevant  for  MOS  that  repair  electrical  and  electronic  systems, 
and  Dimension  H  (Clerical)  is  relevant  for  MOS  that  must  maintain 
records  or  accounts  and/or  perform  general  office  work.  Note 
that  Dimension  L  (Air  Traffic  Control)  is  not  relevant  for  any 
MOS  because  none  of  the  MOS  studied  are  even  remotely  involved  in 
controlling  air  traffic.  At  first  glance,  it  may  seem  surprising 
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that  the  Combat  Dimension  did  not  appear  relevant  to  all  MOS. 
However,  because  Core  Technical  Importance  ratings  were  used  to 
define  dimension  relevance,  one  might  expect  combat  to  be  a  less 
significant  part  of  the  MOS-specific  aspect  of  most  of  the  MOS 
studied . 


Table  3.17 

Task  Dimensions  with  Major  Relevance  to  Phase  I,  II,  and  III  MOS 


_ Task  Dimension _ 

MOS  IDAMNPOGJCKLEHQBF 


95B  Military  Police  612 

11B  Infantryman  5  2 

19K  Armor  Crewman  512 

16S  MANPADS  Crewman  412 

88M  Motor  Transport  Operat  211 
12B  Combat  Engineer  312 

13B  Cannon  Crewman  212 

54B  Chemical  Ops  Specialist  311 
31D  Mobile  Radio  Operator  311 
27E  TOW/Dragon  Repair  113 

63B  Light  Vehicle  Repairer  112 
31C  Single  Channel  Radio  Op  3  1  1 

29E  Radio  Repairer  1 

94B  Food  Service  Specialist  1 
91A  Medical  Specialist  1 

51B  Carpentry/Masonry  Spec  1 
71L  Admin  Specialist  1 

76Y  Unit  Supply  Specialist  1 
67N  Utility  Helicopter  Rep  1  3 

55B  Ammunition  Specialist  1  1 

96B  Intelligence  Analyst  6 


8 

7 

4 

5 
1 

6 
3 
1 


1 

3 


1 

4 


1  1 
111 
1  1 
1 

1 


1 


1 


1 


1 

1  1 


3 


1 

1  3  11 

15  1 

1 

1  12 
5  1 

1 
1 

2 

3 

4 

1  1 
1 

1 


Note .  Task  dimensions  are: 

I  -  Communication 

A  =  Mechanical  Maintenance 

N  =  Crew- served  Weapons 

0  =  First  Aid 

J  «•  Analyze  Information 

K  *  Math  and  Data 

E  =  Construct /Assemble 

Q  -  Supervision 

F  -  Technical  Procedures 


D  -  Vehicle  and  Equipment  Operations 
M  =  Individual  Combat 
P  »  Identify  Targets 
G  *  Technical  Drawings 
C  -  Pack  and  Load 
L  -  Air  Traffic  Control 
H  -  Clerical 

B  -  Electrical  and  Electronic 
Maintenance 


•Number  of  task  categories,  within  the  dimension,  judged  relevant  to  the  MOS. 


By  the  nature  of  the  definition.  Core  Technical  Importance 
ratings  describe  the  unique  characteristics  of  an  MOS.  On  the 
other  hand.  Frequency  and  Overall  Job  Importance,  because  they 
are  highly  related  to  both  the  General  Soldiering  and  Core 
Technical  job  components,  may  be  better  overall  descriptors. 
Indeed,  if  the  mean  convergent  validities  presented  in  Table  3.13 
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are  entered  into  the  Multiple  R  formula  for  two  predictors,  the 
result  shows  that  Core  Technical  Importance  and  General 
Soldiering  Importance  may  be  combined  to  account  for  99.6%  of  r.he 
variance  in  the  Overall  Job  Importance  ratings.  The  following 
chapter  examines  both  Core  Technical  Importance  ratings  and 
Overall  Job  Importance  ratings  as  bases  for  the  development  of 
synthetic  predictor  equations . 
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Chapter  4:  Formation  of  Job  Performance  Prediction 
Equations  and  Evaluation  of  Their  Validity 

Norman  G.  Peterson,  Cynthia  K.  Owens-Kurtz,  and  Rodney  L.  Rosse 

(PDRII ) 


In  this  chapter  we  describe  the  formation  of  prediction 
equations  using  the  ratings  collected  with  the  Army  Task 
Questionnaires,  ratings  of  the  validity  of  a  set  of  predictor 
constructs  for  the  task  categories  (collected  earlier  in  the 
project;  see  Peterson,  Owens-Kurtz,  Hoffman,  Arabian,  &  Whetzel, 
1990),  and  empirical  estimates  of  the  correlations  of  measures  of 
the  predictor  constructs .  We  report  and  evaluate  the  results 
when  those  equations  are  applied  to  data  from  samples  for  18  MOS 
collected  as  part  of  Project  A.  We  compare  results  from  the 
application  of  the  synthetic  methodology  to  results  from  the 
application  of  the  more  traditional  validity  generalization 
methodology. 

Before  proceeding  with  the  details  of  the  methods  and 
computations,  we  provide  a  more  general  overview  of  the  elements 
that  go  into  the  synthetic  validity  methodology  developed  for 
this  project.  Figure  4.1  shows  the  elements  of  the  synthetic 
models  that  we  describe  in  this  chapter.  Starting  at  the  left 
side  of  the  figure,  note  that  attribute  items  are  tied  to  job 
descriptor  components  or  items  (task  category  items)1  by  ratings 
of  the  validity  of  each  attribute  for  predicting  performance  on 
each  of  the  descriptor  items.  Note  also  that  these  validity 
ratings  are  made  by  psychologists.  Thus,  the  attributes  are  here 
cast  clearly  as  predictors  of  very  discrete  and  relatively  small 
pieces  of  Army  jobs .  We  refer  to  weights  obtained  from  these 
ratings  as  "attribute-by-component"  weights. 

Moving  across  the  figure  to  the  right,  note  next  that  the 
task  category  items  are  tied  to  a  specific  MOS  by  officers/NCOs 
who  make  ratings  of  the  frequency,  importance,  and  difficulty  of 
each  item  with  respect  to  a  particular  MOS.  Note  also  that  these 
ratings  may  be  made  with  regard  to  overall  performance  or  for 
slightly  more  specific  parts  of  MOS  job  performance,  such  as  Core 
Technical  or  General  Soldiering  Proficiency.  Weights  obtained 
from  these  kinds  of  ratings  are  referred  to  as  "component-by- job" 
or  "criticality"  weights. 


*In  an  earlier  phase  of  this  project,  other  types  of  job 
descriptor  items  were  used:  job  activity  items  and  a  "hybrid" 
item  type  that  combined  the  task  category  and  job  activity  types . 
Analyses  of  data  from  earlier  phases  indicated  that  the  task 
category  item  type  seemed  most  acceptable  to  SMEs,  provided  the 
most  reliable  ratings,  and  led  to  the  highest  levels  of  validity 
when  used  in  synthetic  equations . 
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Formation  of  Equations  and  Evaluation  of  Their  Validity 


As  in  the  evaluations  of  synthetic  equations  derived  for 
earlier  phases  of  this  project  {Wise,  Peterson,  Rosse,  & 

Campbell,  1989;  Oppler,  Peterson,  &  Wise,  1990),  the  present 
evaluations  focus  on  two  general  criteria — absolute  and 
discriminant  validity.  Absolute  validity  refers  to  the  degree  to 
which  the  synthetic  equations  are  able  to  predict  performance  in 
the  specific  jobs  for  which  they  were  developed.  For  example, 
how  well  does  a  particular  synthetic  equation  derived  for 
soldiers  in  19K  predict  Core  Technical  Proficiency  in  that  MOS? 
Data  from  Project  A  were  used  to  obtain  empirical  estimates  of 
these  validities.  The  second  criterion,  discriminant  validity, 
refers  to  the  egree  to  which  performance  in  each  job  is  better 
predicted  by  the  synthetic  equation  developed  specifically  for 
that  job,  than  by  the  synthetic  equations  developed  for  the  other 
MOS.  For  instance,  how  much  better  can  the  synthetic  equation 
developed  for  19K  predict  Core  Technical  Proficiency  in  that  MOS 
than  the  synthetic  equations  developed  to  predict  Core  Technical 
Proficiency  in  each  of  the  other  MOS?  Empirical  estimates  of 
correlations  relevant  to  this  criterion  were  also  derived  from 
data  collected  in  Project  A. 

The  synthetic  equations  whose  absolute  and  discriminant 
validities  are  reported  here  were  based  on  the  job  component 
model  described  above.  The  equations  required  two  different  sets 
of  weights,  attribute-by-component  (for  predicting  MOS 
performance  at  the  individual  component  level)  and  component-by¬ 
job  (for  weighting  the  individual  component  prediction  equations 
to  form  an  overall  prediction  equation). 

We  examined  the  degree  to  which  the  absolute  and 
discriminant  validities  of  the  synthetic  equations  depend  on  the 
particular  methods  (described  below)  by  which  these  sets  of 
weights  are  formulated. 

Predictor  Measure  and  Job  Performance  Data 


The  predictor  measure  and  job  performance  data  used  in  these 
analyses  were  taken  from  the  Project  A  Concurrent  Validation  (CV) 
data  base.  The  overall  data  set  included  predictor  and  job 
performance  measures  collected  on  soldiers  in  19  different  jobs. 
Eighteen  of  these  jobs  or  Military  Occupational  Specialties  (MOS) 
were  included  in  either  Phase  I,  II,  or  III  of  this  project.  We 
used  all  18  MOS  to  evaluate  the  validity  of  the  synthetic 
equations.  Table  4.1  shows  the  designations  and  names  of  these 
MOS,  as  well  as  the  phase  in  which  they  were  included  and  their 
CV  sample  size. 

The  individual  predictor  measures  included  in  the  Project  A 
battery  have  been  described  in  detail  by  Peterson,  Hough,  et  al. 
(1990).  Owens-Kurtz  and  Peterson  (1989)  have  described  the 
identification  of  specific  measures  in  the  Project  A  data  set 
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corresponding  to  26  of  the  30  items  in  the  Synthetic  Validity 
Project's  attribute  taxonomy.  These  26  measures  were  used  in  the 
analyses  reported  here.  (Thus,  validity  ratings  were  not  used 
for  the  four  attributes  not  associated  with  Project  A  measures.) 

Wise,  Campbell,  McHenry,  and  Hanser  (1986),  and  Campbell, 
McHenry,  and  Wise  (1990),  have  described  the  identification  and 
measurement  of  five  job  performance  constructs  of  interest  to  the 
Army:  job-specific  proficiency  (called  "Core  Tecnnical 

Proficiency  or  CTP"),  general  soldiering  proficiency,  effort  and 
leadership,  personal  discipline,  and  physical  fitness  and 
military  bearing.  For  the  synthetic  validation  analyses  reported 
here  we  chose  to  use  the  job-specific  proficiency  measures,  plus 
an  overall  performance  measure  that  is  a  weighted  combination  of 


Table  4 . 1 

MOS  Included  in  Synthetic  Validity  Investigations  Phase  of 
Project,  and  Sample  Size  for  Project  A  Concurrent  Validation  Data 


MOS 

.  Label 

SV  Phase 

CV  Sample 
Size 

1  IB 

Infantryman 

1 

491 

12B 

Combat  Engineer 

3 

544 

13B 

Cannon  Crewman 

3 

464 

16S 

MANPADS  Crewman 

2 

338 

19K 

Armor  Crewman 

2 

394 

27E 

Tow/Dragon  Repairer 

3 

123 

29E* 

Radio  Repairer 

3 

— 

31C 

Single  Channel  Radio  Operator 

3 

289 

31D* 

Mobile  Subscriber  Equipment 
Transmission  System  Operator 

3 

_  _  _ 

51B 

Carpentry  and  Masonry  Specialist 

3 

69 

54B 

Chemical  Operations  Specialist 

3 

340 

55B 

Ammunition  Specialist 

3 

203 

63B 

Light  Wheel  Vehicle  Mechanic 

1 

478 

67N 

Utility  Helicopter  Repairer 

2 

238 

7 1L 

Administrative  Specialist 

1 

427 

76Y 

Unit  Supply  Specialist 

2 

444 

88M 

Motor  Transport  Operator 

2 

507 

91A 

Medical  Specialist 

2 

392 

94B 

Food  Service  Specialist 

2 

368 

95B 

Military  Police 

3 

597 

96B* 

Intelligence  Analyst 

3 

- - 

*No  Project  A  Concurrent  Validation  data  available. 
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all  five  construct  measures  (Sadacca,  Campbell,  White,  &  DiFazio, 
1989) .  (The  job-specific  measures  were  composed  of  items  from 
written  tests  of  job  knowledge  and  hands-on  work  samples.)  The 
decision  to  use  these  two  measures  was  made  for  three  reasons. 
First,  and  primarily,  the  Synthetic  Validity  Project  is  most 
closely  focused  on  the  development  of  prediction  composites  for 
job-specific  aspects  of  performance.  Second,  Wise,  McHenry,  and 
Campbell  (1990)  showed  that  the  same  predictor  measures  are 
optimal  for  a  wide  range  of  jobs  in  predicting  all  but  job- 
specific  proficiency.  Significant  differences  across  jobs  were 
found  in  the  predictors  of  job-specific  proficiency.  Thus,  it 
appears  that  discriminant  validity  could  not  be  legitimately 
expected  for  any  other  criterion  measure.  These  first  two 
reasons  argue  strongly  for  the  inclusion  of  the  Core  Technical 
Proficiency  construct  as  a  separate  criterion,  but  none  of  the 
other  four  separately.  However,  thirdly,  it  is  of  scientific  and 
practical  interest  to  determine  the  validity  of  the  synthetic 
methodology  for  predicting  Overall  Job  Performance,  in  particular 
if  Overall  Performance  is  less  well  predicted  than  Core  Technical 
Proficiency. 

As  noted  earlier,  the  number  of  soldiers  with  complete  data 
on  the  predictor  and  criterion  measures  in  the  Concurrent 
Validation  samples  corresponding  to  the  Synthetic  Validity 
Project  MOS  are  reported  in  Table  4.1.  These  samples  differed 
somewhat  in  terms  of  the  heterogeneity  and  mean  levels  of  the 
predictor  scores.  Also,  because  all  were  selected  job 
incumbents ,  they  had  higher  and  less  variable  predictor  scores  in 
comparison  to  the  overall  pool  from  which  applicants  are  drawn. 
Common  practice  has  been  to  use  a  multivariate  correction  to 
adjust  covariances  and  correlations  for  differences  in 
heterogeneity  (Lord  &  Novick,  1968).  This  procedure  corrects  for 
effects  of  restriction  in  range  due  to  explicit  selection  on  the 
subtests  of  the  ASVAB  and  incidental  selection  on  other  Project  A 
predictors.  A  second  correction  was  made  for  self-selection  into 
each  occupational  specialty  and  attrition  after  initial 
enlistment . 

We  used  a  two-step  procedure  to  adjust  for  range  restriction 
due  to  both  sources  of  selection.  The  1980  Youth  Population 
sample  to  which  the  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB)  was  administered  is  used  as  the  target  population. 

First,  we  computed  the  covariance  of  the  26  predictor  measures 
(corresponding  to  the  attributes)  for  each  of  the  18  MOS-specific 
samples  and  adjusted  these  covariances  for  differences  between 
the  samples  and  the  Youth  Population  in  the  covariances  of  the 
ASVAB  subtests.2  This  provided  us  with  estimates  of  the 


2We  initially  used  a  different,  less  traditional  method  of 
estimating  the  population  predictor  intercorrelation  matrix. 
However,  follow-up  simulation  analyses  convinced  us  that  the  more 
traditional  method  was  best.  See  the  Addendum  to  this  report  for 
a  full  description  of  the  investigation  of  this  matter. 
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covariances  among  the  attribute  measures  for  the  Youth 
Population,  had  all  of  the  Project  A  predictor  measures  been 
administered  to  them.  (Assumptions  underlying  these  estimates 
are  described  in  Lord  and  Novick,  1968.) 

Second,  we  computed  covariances  for  each  of  the  18  job- 
specific  samples  that  included  26  predictors  plus  the  Core 
Technical  and  Overall  Performance  criterion  construct  scores.  We 
then  adjusted  these  covariances  for  differences  between  the  job 
specific  sample  and  the  estimated  Youth  Population  covariances. 
These  corrections  provided  estimates  of  the  covariances  among  the 
26  predictors  and  Core  Technical  and  Overall  Performance  in  each 
of  the  18  MOS  for  the  1980  Youth  Population. 

Table  4.2  shows  the  means  and  standard  deviations  of  the 
predictor  measures  in  the  total  CV  sample.  The  means  and 
standard  deviations  for  each  of  the  attribute  measures  in  the 
samples  for  each  of  the  18  Synthetic  Validity  MOS  are  not  shown 
here  in  the  interest  of  conserving  space,  but  are  given  in 
Appendix  D  of  this  report.  The  estimated  standard  deviations  for 
the  Youth  Population  are  also  shown  in  Table  4.2.  (The  means  for 
the  Youth  Population  are  not  used  in  the  following  analyses  and 
so  were  not  estimated. ) 

Method  of  Forming  Equations 

Once  the  covariances  of  the  predictor  and  criterion  measures 
are  estimated  for  each  job,  validities  for  any  given  composite  of 
the  predictors  can  be  estimated  through  relatively  direct  matrix 
manipulations.  For  the  equations  reported  here,  there  are  two 
steps  in  forming  a  synthetic  predictor  score  composite.  First, 
scores  on  individual  Project  A  measures  of  the  attributes  are 
standardized,  weighted  (by  the  psychologists'  ratings  of 
validities),  and  summed  to  form  a  predicted  score  for  each  job 
component.  Second,  these  predicted  job  component  scores  are  then 
weighted  (according  to  job  description  ratings  by  the 
officers/NCOs)  and  summed  to  form  the  predicted  total  job 
performance  score. 

We  developed  several  methods  of  forming  equations  using  the 
basic  steps  outlined  just  above.  These  methods  varied  according 
to  the  criterion  being  predicted,  the  method  of  forming  the 
attribute  by  component  weights,  the  method  of  forming  the 
component  by  job  weights,  and  the  techniques  used  to  directly 
"reduce"  the  number  of  predictor  measures  included  in  the  final 
equation.  We  turn  now  to  a  description  of  these  variations. 

The  criterion  predicted.  As  noted  earlier,  we  were 
interested  in  the  extent  to  which  the  synthetic  methodology  could 
provide  prediction  equations  for  both  Core  Technical  and  Overall 
Performance.  Scores  on  both  criteria  were  available  from  the 
Project  A  data  base,  so  it  was  possible  to  evaluate  the  validity 
of  both  types  of  equations.  In  terms  of  developing  the  synthetic 
equations,  only  the  component  by  job  weights  are  affected.  (The 
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Table  4.2 


Means  and  Standard  Deviations  for  9  ASVAB  Subtests  and  26 
Attribute  Measures:  Project  A  Total  CV  Sample 


Measure 

N 

All  MOS 
Mean 

Std  Dev 

1980 

Population 
Std  Dev 

ASVAB  Subtests 

GS 

General  Science 

7045 

51.40 

8.13 

10.00 

AR 

Arithmetic  Reasoning 

7045 

52.87 

7.28 

10.00 

VE 

Verbal 

7045 

50.96 

6.44 

10.00 

NO 

Numeric  Operations 

7045 

52.71 

6.38 

10.00 

CS 

Coding  Speed 

7045 

51.28 

6.68 

10.00 

AS 

Auto/ Shop  Information 

7045 

54.14 

8.53 

10.00 

MK 

Mathematics  Knowledge 

7045 

50.98 

7.39 

10.00 

MC 

Mechanical  Comprehension 

7045 

53.11 

8.17 

10.00 

El 

Electronics  Information 

7045 

52.14 

7.55 

10.00 

Attribute  Measures 

Verbal  Ability 

7045 

102.37 

13.51 

18.97 

Reasoning 

7045 

102.44 

16.46 

19.27 

Number  Ability 

7045 

100.00 

17.40 

25.35 

Spatial  Ability 

7045 

100.00 

17.43 

21.18 

Mental  Information  Processing 

7045 

100.00 

23.59 

24.71 

Perceptual  Speed  &  Accuracy 

7045 

100.00 

17.64 

20.43 

Memory 

7045 

50.00 

14.22 

14.95 

Mechanical  Comprehension 

7045 

133.33 

17.63 

22.85 

Eye 

-Limb  Coordination 

7045 

0 

14.01 

14.78 

Precision 

7045 

0 

18.84 

20.39 

Movement  Judgment 

7045 

6.62 

9.00 

9.38 

Hand  &  Firmer  Dexterity 

7045 

16.73 

7.76 

7.86 

Involveirr  in  Athletics 

7045 

13.90 

3.06 

3.07 

Work  Orit  .ation 

7045 

150.00 

26.12 

26.76 

Cooperation/ Stability 

7045 

150.00 

26.40 

26.94 

Energy 

7045 

48.43 

5.99 

6.09 

Conscientiousness 

7045 

102.48 

16.52 

16.66 

Dominance /Confidence 

7045 

100.00 

18.12 

18.92 

Interest  in  Using  Tools 

7045 

200.00 

32.93 

34.79 

Interest  in  Rugged  Activities 

7045 

150.00 

26.01 

26.46 

Interest  in  Protective  Services 

7045 

100.00 

17.03 

17.20 

Interest  in  Technical  Activities 

7045 

150.00 

23.55 

23.57 

Interest  in  Science 

7045 

•200.00 

29.23 

29.51 

Interest  in  Leadership 

7045 

40.07 

8.45 

8.59 

Interest  in  Artistic  Activities 

7045 

14.13 

4.10 

4.16 

Interest  in  Efficiency  &  Organization 

7045 

200.00 

29.95 

30.71 

attribute  by  component  weights  are  not  affected  since  they  are 
derived  from  judgments  made  by  psychologists  about  the  validity 
of  an  attribute  for  performance  on  a  discrete  job  component, 
i.e.,  a  particular  task  category.)  Recall  from  Chapter  3  that 
the  Army  SMEs  provided  judgments  about  the  importance  of  task 
categories  for  Core  Technical,  General  Soldiering,  and  Overall 
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Performance.  Therefore,  when  the  object  of  prediction  was  Core 
Technical  Proficiency,  the  lask  category  importance  judgments  for 
Core  Technical  Proficiency  were  used,  and  when  the  object  of 
prediction  was  Overall  Performance,  the  task  category  importance 
judgments  for  Overall  Performance  were  used. 

Attribute-by-component  weights .  Three  different  methods 
were  used  to  form  the  attribute-by-component  weights .  One  method 
for  developing  prediction  equations  for  each  job  component  used 
attribute  weights  that  were  directly  proportional  to  the 
attribute-by-component  validities  estimated  by  psychologists. 

This  was  called  the  validity  method.  A  second  alternative  was  to 
use  zero  or  one  weights  (called  the  0-1  attribute  weight  method). 
In  this  alternative,  all  attributes  with  mean  validity  ratings 
for  a  component  less  than  3.5  (corresponding  to  a  validity 
coefficient  of  .30)  were  given  a  weight  of  0  and  all  remaining 
attributes  were  given  a  weight  of  1 .  A  third  alternative  was 
identical  to  the  zero-one  weight,  except  that  when  a  mean 
validity  rating  was  3.5  or  greater,  the  weight  given  was 
proportional  (as  in  the  first  method)  rather  than  set  to  1.  This 
was  called  the  0-mean  attribute  weight  method. 

Component-by-job  or  "criticality1*  weights.  With  regard  to 
these  "criticality"  weights,  we  were  primarily  interested  in  two 
topics.  First,  results  in  prior  phases  showed  that  the  use  of 
cutoffs  or  thresholds  on  criticality  weights  (that  is,  setting 
lower  weights  to  zero)  produced  higher  discriminant  validities 
without  sacrificing  much  absolute  validity.  Second,  we  were 
interested  in  the  extent  to  which  the  grouping  of  similar  MOS 
into  clusters  might  produce  synthetic  equations  with  higher 
absolute  or  discriminant  validities  than  those  produced  by  MOS- 
specific  equations.  Therefore,  we  used  four  types  of  criticality 
weights .  These  weights  were  based  on  the  mean  task  importance 
ratings  (for  Core  Technical  or  Overall  Performance,  as 
appropriate;  see  The  Criterion  Predicted  section  above)  computed 
for  an  MOS  or  for  a  cluster  of  MOS.  Specifically,  they  were: 

1.  Mean  importance  ratings  computed  across  all  SMEs  for  an 
MOS,  dubbed  "MOS  Mean  Component  Weights." 

2.  Mean  importance  ratings  computed  across  all  SMEs  for  an 
MOS  transformed  such  that  means  <  3.5  were  set  to  zero,  and  means 
above  3.5  were  left  as  is,  dubbed  "MOS  Threshold  Component 
Weights . " 

3.  Mean  of  "MOS  mean"  importance  ratings  for  MOS  that  were 
similar  in  terms  of  their  mean  task  importance  profiles 
[determined  by  performing  a  Ward  &  Hook  (Wilkinson,  1988) 
clustering  of  all  MOS  based  on  the  appropriate  profiles;  see 
below],  dubbed  "Cluster  Mean  Component  Weights." 

4.  Transformed  "Cluster  Mean"  ratings,  using  the  same 
cutoff  criteria  (set  to  zero  if  <  3.5),  dubbed  "Cluster  Threshold 
Component  Weights . " 
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We  clustered  the  MOS  by  correlating  their  mean  task  category 
importance  profiles  (their  mean  scores  across  all  96  task 
categories),  and  then  performing  a  Ward  &  Hook  clustering  on  the 
correlation  matrices  (Wilkinson,  1988).  This  was  done  for  both 
the  Core  Technical  importance  ratings  and  the  Overall  importance 
ratings.  These  analyses  were  carried  out  for  all  21  MOS  included 
in  the  Synthetic  Validity  Project,  not  just  for  the  18  that  were 
in  the  Project  A  data  base,  since  the  appropriate  data  were 
available  and  the  larger  sample  should  provide  more  stable 
results.  Tables  4.3  and  4.4  show  the  correlation  matrices  and 
Figures  4.2  and  4.3  show  the  Ward  &  Hook  results  as  a  tree 
diagram.  The  numbers  on  the  far  right  of  the  diagram  are  the 
distance  metrics  (1-Pearson  r)  just  before  two  entities  are 
combined.  We  selected  four  clusters  as  the  most  meaningful 
solution  for  the  Core  Technical  importance  ratings,  and  named  the 
clusters  Electronics,  Administration/Support,  Combat,  and 
Mechanical/Construction,  based  on  the  MOS  included  in  each 
cluster.  We  selected  three  clusters  as  the  most  meaningful 
solution  for  the  Overall  importance  ratings  and  named  the 
clusters  Electronics/Repair,  Administration/  Support,  and  Combat. 
These  clusters  are  summarized  in  Figure  4.4. 

Reduction  of  number  of  predictors  in  the  synthetic  equation. 
As  a  practical  matter,  it  is  unlikely  that  the  Army  can  use  all 
26  attribute  measures  to  predict  MOS  performance.  Therefore,  it 
is  of  some  interest  to  explore  methods  of  reducing  the  number  of 
predictors  used  in  synthetic  equations  and  to  evaluate  the 
effects  of  those  methods  on  the  validity  of  the  equations. 

One  obvious  method  for  reducing  the  number  of  predictors  is 
to  use  only  the  ASVAB  measures.  Three  Project  A  predictor  score 
composites  that  matched  the  Synthetic  Validity  attributes 
consisted  only  or  largely  of  ASVAB  measures.  These  three 
composites  closely  parallel  measures  of  the  ASVAB  Verbal, 
Numerical,  and  Technical  factors.  We  constructed  synthetic 
equations  using  only  these  three  measures,  with  their  associated 
attribute-by-component  weights.  This  method  was  called  the  ASVAB 
reduction . 

We  used  two  other  methods  that  employed  stepwise  regression 
to  reduce  the  number  of  predictors.  In  the  first  method,  the 
full  synthetic  equation  is  first  constructed  using  all  attribute- 
by-component  and  component-by- job  weights,  and  then  the  predictor 
contributing  the  least  to  predicting  the  full  synthetic  equation 
is  dropped.  This  process  continues  until  the  reduced  equation 
correlates  less  than  .95  with  the  full  equation.  We  selected  the 
criterion  of  .95  because  this  insures  that  the  correlation  of  the 
reduced  equation  with  some  external  variable  (such  as  job 
performance)  will  be  reasonably  close  to  the  full  equation  with 
the  external  variable.  This  method  was  called  the  .95  stepwise 
reduction . 
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Correlation  Matrix  of  21  MOS,  Based  on  Mean  Core  Technical  Importance  Profiles 
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Correlation  Matrix  of  21  MOS,  Based  on  Mean  Overall  Importance  Profiles  on  96 
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Figure  4.2.  Cluster  analysis  of  21  MOS  based  on  Mean  Core  Technical  Importance 
Ratings  on  96  tasks. 
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Figure  4.3.  Cluster  analysis  of  21  MOS  based  on  Mean  Overall  Importance  Ratings 
on  96  tasks. 
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OVERALL  PERFORMANCE 
Cluster  MOS 
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27E, 

29E, 

31C, 

31D, 

63B, 

67N 
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Administration/ 
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55B , 
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91A, 
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3. 
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12B, 

13B, 

16S, 

19K, 
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Note :  No  attribute  x  job  performance  validity  matrix  is 

available  for  the  underlined  MOS  (29E,  31D,  96B) .  These  MOS  were 
not  included  in  the  synthetic  equation  analyses. 


Figure  4.4.  MOS  clusters  based  on  Mean  Task  Importance  for  Core 
Technical  Proficiency  and  Overall  Performance. 
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The  second  stepwise  reduction  method  used  the  same  reduction 
technique  except  that  the  reduction  continued  until  only  five 
predictors  remained  in  the  equation.  Five  was  chosen 
arbitrarily,  but  it  seemed  to  be  a  reasonable  number  from  a 
practical  viewpoint.  This  method  was  called  the  top  five 
stepwise  reduction. 

"Empirical  Weights. "  In  addition  to  the  synthetically 
produced  predictor  composites,  we  developed  "empirical" 
prediction  equations  using  least-squares  regression  of  the  26 
predictor  measures  against  the  Core  Technical  and  Overall 
Performance  criterion  composites  within  each  of  the  18  MOS . 

We  also  developed  equations  for  each  MOS  using  the  three 
ASVAB  predictors  against  the  Core  Technical  criterion  and  against 
the  Overall  Performance  criterion.  When  the  same  empirical  data 
were  used  to  estimate  the  validity  of  the  empirical  composites 
that  were  used  to  develop  them,  (e.g.,  when  the  equation 
developed  on  the  19K  sample  was  applied  to  the  19K  sample),  we 
applied  adjustments  to  yield  unbiased  estimates  of  cross- 
validated  coefficients  for  these  composites.  We  used  three 
different  adjustments  to  correct  the  bias.  Two  of  these  were 
from  Claudy  (1978).  One  provided  an  estimate  of  the  population 
multiple  correlation  coefficient  (i.e.,  the  coefficient  that 
would  result  if  one  could  obtain  the  actual  population  weights 
for  the  least  squares  equation.  Equation  12,  p.  603)  and  one 
provided  an  estimate  of  the  validity  coefficient  in  the 
population  for  the  sample-derived  weights  (unnumbered  equation, 
p.  606).  These  two  methods  of  adjustment  were  arrived  at  through 
empirical  means  based  on  some  Monte  Carlo  work.  The  third 
adjustment  is  from  Rozeboom  (1978,  Equation  8,  p.  1350),  based  on 
Browne's  earlier  work,  and  also  provides  an  estimate  of  the 
population  validity  coefficient  for  the  sample-derived  weights. 

We  had  used  Claudy' s  estimate  of  the  population  multiple 
correlation  coefficient  in  the  first  two  phases,  but  decided  that 
an  estimate  of  the  validity  coefficient  in  the  population  when 
using  the  sample-derived  weights  was  the  more  appropriate 
estimate  for  the  actual  applied  problem  of  predicting  job 
performance  for  future  Army  applicants  using  weights  derived  from 
Project  A  samples.  We  wished  to  continue  to  provide  the  earlier 
used  estimate  as  well  as  to  try  out  two  .different  estimates  of 
the  validity  coefficient,  one  (Claudy 's)  that  was  empirically 
based  and  one  (Rozeboom' s)  that  was  derived  analytically  and 
fairly  widely  accepted  as  an  accurate  estimate  of  the  validity  of 
sample-derived  weights  (see  Mitchell  and  Klimoski,  1986). 

On  the  other  hand,  no  adjustments  were  made  when  we 
estimated  the  validity  of  the  empirical  equation  developed  for 
one  job  for  predicting  performance  in  a  different  job.  This  is 
because  the  criterion  data  for  the  other  jobs  were  not  used  in 
the  development  of  the  empirical  weights,  therefore  removing  the 
possibility  of  positive  bias  due  to  error-fitting. 
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Analyses 


Figure  4 . 5  shows  a  representation  of  the  synthetic  equations 
that  were  created  from  the  variations  in  method  described  above. 
Each  row  in  this  figure  represents  a  set  of  18  equations  (one 
equation  for  each  of  the  MOS )  and  describes  the  criterion  it  was 
designed  to  predict,  the  type  of  component-by- job  weights,  the 
type  of  attribute-by-component  weights ,  and  the  method  used  to 
reduce  the  number  of  predictors  in  the  equation.  The  "Run  ID#" 
corresponds  to  the  table  numbers  in  Appendices  E  and  F.  This 
order  was  chosen  for  clarity,  and  does  not  represent  a  necessary 
sequencing  of  the  analyses.  For  example,  run  #1  was  designed  to 
predict  Core  Technical  Proficiency  (and  thus  the  Core  Technical 
importance  ratings  were  used  for  component-by- job  weights),  "MOS 
Mean"  component-by- job  weights  were  used,  mean  validity  ratings 
were  used  for  attribute-by-component  weights,  and  the  reduction 
method  was  "none,"  i.e.,  no  reduction  was  done.  Table  4.5  shows 
the  normalized  weights  for  each  of  the  26  attribute  measures  for 
each  of  the  18  MOS  for  run  #1.  (See  Appendix  E,  page  v,  for  a 
key  to  the  attribute  abbreviations  used  as  column  headings.)  By 
way  of  comparison,  Table  4.6  shows  the  least  squares  Beta  weights 
("empirical  weights")  for  the  26  attribute  measures  for 
predicting  Core  Technical  Proficiency  for  each  of  the  18  MOS. 
(Tables  showing  weights  for  all  methods  are  found  in  Appendix  E.) 

As  shown  in  Figure  4.5,  there  were  40  "runs"  or  different 
types  of  synthetic  equations  computed  for  each  of  the  18  MOS. 
Table  4.7  shows  the  results  of  one  such  run,  i.e.,  the 
correlations  of  the  18  synthetic  composites  depicted  in  Table  4.5 
with  Core  Technical  Proficiency  for  the  18  MOS.  The  correlations 
on  the  diagonal  in  this  table  represent  the  absolute  validities 
of  the  composites  (i.e.,  the  correlations  between  each  composite 
and  Core  Technical  Proficiency  in  the  particular  MOS  for  which  it 
was  developed),  whereas  the  correlations  on  the  off-diagonal 
represent  the  validities  of  the  composites  for  predicting  Core 
Technical  Proficiency  in  the  other  MOS.  Note  that  the  upper  and 
lower  triangles  of  this  matrix  are  most  easily  interpreted  row  by 
row,  for  example,  the  "11B"  row  shows  the  validity  coefficients 
obtained  when  the  equation  developed  for  11B  is  applied  to  all 
the  MOS,  and  the  "12B"  row  shows  the  validity  coefficients 
obtained  when  the  equation  developed  for.  12B  is  applied  to  all 
MOS.  Table  4.8  shows  similar  results  for  the  "empirical"  or 
least  squares  composites  for  predicting  Core  Technical 
Proficiency. 

Table  4.9  shows  the  results  for  the  least  squares  equations 
developed  for  each  of  the  18  MOS  when  all  26  predictors  are  used 
to  predict  Core  Technical  Proficiency  and  Overall  Performance. 
Shown  are  the  foldback  multiple  correlation  coefficient  (r),  the 
Claudy  estimate  of  the  population  multiple  correlation 
coefficient  (Claudy  Pop.  R),  the  Claudy  estimate  of  the  validity 
of  the  sample  weights  in  the  population  (Claudy  Vldty),  and  the 
Rozeboom  estimate  of  the  validity  of  the  sample  weights  in  the 
population  (Rozeboom  Vldty).  The  table  also  shows  the  means 
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across  all  18  MOS  for  the  various  coefficients.  Note  that  the 
foldback  coefficient  (e.g./  when  the  equation  developed  on  the 
13K  Sample  was  applied  to  19K)  is  always  highest,  of  course, 
followed  in  order  by  the  Claudy  population  R,  the  Claudy  validity 
estimate,  and  the  Rozeboom  validity  estimate.  The  two  validity 
estimates  are  about  .03  lower  than  the  Claudy  population  R 
estimate,  as  expected.  More  importantly,  however,  note  that  the 
two  validity  estimates  are  very  close  in  magnitude — the  mean 
estimates  differ  by  .006  for  Core  Technical  Proficiency  and  by 
.007  for  Overall  Performance.  Table  4.10  shows  the  same  seL  of 
results  when  only  the  three  ASVAB  predictors  enter  the  equation. 
In  this  case,  there  is  much  less  shrinkage  because  so  few 
predictors  are  used  compared  to  the  full  set  of  predictors  (3 
versus  26).  Note  that  the  same  pattern  of  results  still  holds, 
however,  and  we  again  see  a  small  difference  between  the  two 
validity  estimates. 

Table  4.11  shows  the  absolute  and  discriminant  validities 
for  each  of  the  40  synthetic  validation  equations.  Each  of  the 
40  absolute  validities  shown  in  this  table  is  the  mean  validity 
computed  across  the  18  MOS.  In  order  to  provide  an  estimate  of 
the  statistical  significance  of  the  differences  between  these 
absolute  validities,  we  computed  the  two-way  analysis  of  variance 
with  MOS  (18  levels)  and  Method  (40  levels)  as  main  effects,  and 
the  MOS  x  Method  interaction  as  the  error  term.  The  mean  squares 
(rounded  to  thousandths)  from  the  ANOVA  were  .021  for  Method 
(F[39,663]  =  12.29,  p<.001),  .269  for  MOS  (F[17,663]  =  156.38, 
p<.001),  and  .002  for  the  interaction  effect,  which  is  the  mean 
squared  error.  The  value  of  the  interaction  effect  is  the 
standard  error  for  comparing  the  absolute  validities.  Thus,  the 
95%  confidence  interval  is  plus  or  minus  .004  around  each 
coefficient.  Basically,  this  means  that  a  difference  of  .01 
between  absolute  validity  coefficients  is  statistically 
significant.  This  level  of  difference  is  probably  not 
practically  significant,  but  it  should  be  kept  in  mind  that  even 
very  small  differences  in  validity  can  be  meaningful  for 
organizations  with  a  huge  volume  of  annual  selection  decisions 
( i . e . ,  the  U .  S .  Army ) . 

Also  shown  in  Table  4.11,  for  comparison,  are  the  absolute 
and  discriminant  validities  for  the  least  squares  equations, 
using  the  Rozeboom  validity  estimates  to  compute  the  absolute 
validities.  The  complete  set  of  validity  matrices  appears  in 
Appendix  F. 

General  summary  of  results.  In  summary,  the  synthetic 
equations  produced  high  levels  of  absolute  validity  and  very  low 
levels  of  discriminant  validity.  The  lowest  absolute  validity 
for  a  synthetic  equation  in  Table  4.11  is  .55,  whereas  the 
highest  discriminant  validity  is  .02.  The  values  for  the  least 
squares  equations  in  Table  4.11  show  the  maximum  values  that  we 
might  expect  for  these  data,  and  these  show  discriminant 
validities  for  the  full  set  of  predictors  of  .06  for  Core 
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Table  4.9 

Validity  Estimates  for  18  MOS  Using  All  26  Predictors  to  Predict 
Core  Technical  Proficiency  and  Overall  Performance 


Core  Technical 

Proficiency 

Overall  Performance 

MOS 

r1 

Claudy 
Pop.  R2 

Claudy 

Vldty3 

Rozeboom 

Vldty4 

MOS 

r1 

Claudy 
Pop.  R2 

Claudy  Rozeboom 
Vldty4  Vldty4 

11B 

.747 

.732 

.716 

.713 

11B 

.699 

.680 

.661 

.657 

12B 

.702 

.685 

.668 

.665 

12B 

.679 

.660 

.641 

.638 

13B 

.492 

.447 

.401 

.390 

13B 

.498 

.454 

.409 

.398 

16S 

.585 

.539 

.493 

.482 

16S 

.651 

.616 

.580 

.573 

19K 

.677 

.650 

.623 

.618 

19K 

.708 

.685 

.661 

.656 

27E 

.861 

.823 

.785 

.776 

27E 

.851 

.810 

.769 

.759 

31C 

.685 

.648 

.612 

.604 

31C 

.633 

.587 

.542 

.531 

51B 

.932 

.892 

.852 

.842 

51B 

.914 

.862 

.810 

.798 

54B 

.776 

.756 

.736 

.732 

54B 

.728 

.703 

.678 

.672 

55B 

.756 

.716 

.677 

.668 

55B 

.616 

.542 

.467 

.444 

63B 

.747 

.731 

.715 

.712 

63B 

.623 

.596 

.569 

.564 

67N 

.843 

.824 

.804 

.800 

67N 

.838 

.818 

.798 

.793 

71L 

.681 

.657 

.633 

.628 

71L 

.650 

.623 

.595 

.590 

76Y 

.686 

.663 

.641 

.636 

76Y 

.613 

.583 

.552 

.546 

88M 

.640 

.616 

.593 

.588 

88M 

.601 

.573 

.546 

.541 

91A 

.784 

.768 

.752 

.748 

91A 

.680 

.653 

.627 

.621 

94B 

.771 

.752 

.734 

.730 

94B 

.635 

.601 

.566 

.559 

95B 

.676 

.659 

.642 

.638 

95B 

.733 

.720 

.706 

.704 

Mean 

.725 

.697 

.671 

.665 

Mean 

'.686 

.654 

.621 

.614 

‘Foldback  Multiple  Correlation  Coefficient.  zClaudy  Estimate  of  Population 
Multiple  Coefficient.  3Claudy  Estimate  of  Population  Validity  of  Sample 
Weights.  4Rozeboom  Estimate  of  Population  Validity  of  Sample  Weights. 
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Table  4.10 


Validity  Estimates  for  18  MOS  Using  3  ASVAB  Predictors  to  Predict 
Core  Technical  Proficiency  and  Overall  Performance 


Core 

Technical 

Proficiency 

Overall  Performance 

MOS 

r1 

Claudy 
Pop.  R2 

Claudy 

Vldty5 

Rozeboom 

Vldty4 

MOS 

r1 

Claudy 
Pop.  R2 

Claudy 

Vldty3 

Rozeboom 

Vldty4 

11B 

.680 

.679 

.678 

.675 

11B 

.580 

.578 

.577 

.573 

12B 

.671 

.670 

.669 

.666 

12B 

.581 

.580 

.578 

.575 

13B 

.388 

.384 

.380 

.373 

13B 

.355 

.350 

.346 

.339 

16S 

.508 

.505 

.502 

.495 

16S 

.514 

.511 

.508 

.501 

19K 

.627 

.625 

.624 

.620 

19K 

.602 

.600 

.599 

.594 

27E 

.787 

.785 

.784 

.775 

27E 

.760 

.758 

.756 

.746 

31C 

.635 

.633 

.631 

.625 

3 1C 

.550 

.547 

.544 

.537 

51B 

.837 

.835 

.834 

.821 

5  IB 

.741 

.737 

.733 

.713 

54B 

.727 

.726 

.725 

.721 

54B 

.652 

.650 

.649 

.644 

55B 

.713 

.711 

.710 

.703 

55B 

.497 

.491 

.486 

.474 

63B 

.680 

.679 

.678 

.675 

63B 

.489 

.487 

.484 

.479 

67N 

.811 

.810 

.810 

.806 

67N 

.797 

.796 

.796 

.791 

71L 

.599 

.597 

.596 

.591 

71L 

.556 

.554 

.552 

.547 

76Y 

.646 

.645 

.644 

.640 

76Y 

.543 

.541 

.539 

.534 

88M 

.595 

.594 

.592 

.589 

88M 

.516 

.514 

.512 

.507 

91A 

.726 

.725 

.724 

.721 

91A 

.573 

.571 

.569 

.564 

94B 

.695 

.694 

.693 

.689 

94B 

.514 

.511 

.508 

.502 

95B 

.627 

.626 

.625 

.622 

95B 

.656 

.655 

.654 

.652 

Mean 

.664 

.662 

.661 

.656 

Mean 

-.582 

.580 

.577 

.671 

‘Foldback  Multiple  Correlation  Coefficient.  2Claudy  Estimate  of  Population 
Multiple  Correlation  Coefficient.  3Claudy  Estimate  of  Population  Validity  of 
Sample  Weights.  4Rozeboom  Estimate  of  Population  Validity  of  Sample  Weights. 
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Technical  and  .01  for  Overall,  and  discriminant  validities  for 
the  ASVAB  only  of  .03  for  Core  Technical  and  .01  for  Overall 
Performance.  Thus,  in  the  best  case  (0-1  or  0-mean  attribute 
weights  with  MOS  threshold  component  weights  for  Core  Technical ) , 
it  appears  that  the  synthetic  equations  obtain  93%  of  the 
absolute  validity  and  33%  of  the  discriminant  validity  of  the 
least  squares  equations . 

The  criterion  predicted.  In  general,  the  Core  Technical 
Proficiency  criterion  appears  to  be  better  predicted  than  the 
Overall  Performance  criterion,  although  both  are  well  predicted 
by  the  synthetic  methods .  This  is  especially  so  for  synthetic 
equations  using  0-1  or  0-mean  validity  weights  for  the  attribute- 
by-component  weights  (r  of  .62  or  .64  vs.  .56  to  .59).  Synthetic 
equations  containing  only  ASVAB  predictors  also  do  not  predict 
the  Overall  Performance  criterion  as  well  as  the  Core  Technical 
Proficiency  criterion  (r  of  .64  or  .65  vs.  .57).  Given  the 
generally  low  level  of  discriminant  validity  obtained,  it  is  not 
surprising  that  there  is  little  difference  between  the 
discriminant  validities  for  the  two  criteria. 

Attribute-by-component  weights .  The  0-1  weights  and  0-mean 
weights  produced  nearly  identical  results  and  showed  higher 
absolute  validities  than  did  the  mean  validity  weights , 
especially  for  Core  Technical  Proficiency  (r  of  .62  to  .64  vs.  r 
of  .55  to  .57).  The  0-1  and  0-mean  weights  produced  slightly 
higher  levels  of  discriminant  validity,  but  these  were  still  very 
low  (no  more  than  .02). 

Component-by-job  weights.  Variations  in  methods  of  forming 
these  weights  appeared  to  have  little  impact  on  either  absolute 
validity  or  discriminant  validity,  although  there  does  appear  to 
be  a  small  reduction  in  absolute  validity  for  predicting  Overall 
Performance  when  using  threshold  weights . 

Method  of  reducing  the  number  of  predictors  in  the  synthetic 
equation .  The  two  stepwise  reduction  methods  produced  almost 
identical  results,  about  .56  absolute  validity  and  .00  or  .01 
discriminant  validity.  This  is  not  too  surprising  when  one 
considers  that  the  .95  stepwise  reduction  method  produced 
equations  having  seven  or  eight  predictors  and  that  the  top  five 
stepwise  reduction  method  produced  equations  that  correlated 
about  .92  or  .93,  on  average,  with  the  full  synthetic  equation 
(mean  validities  and  MOS  mean  component  weights,  no  reduction 
method  applied).  Also,  inspection  of  the  attribute  weights 
produced  by  the  two  methods  (see  Appendix  E  Tables  4,  5,  14,  15, 
24,  25,  34,  and  35)  shows  considerable  overlap  in  the  attributes 
that  are  weighted.  Generally  speaking,  the  attributes  most 
frequently  weighted  across  reduction  method  and  MOS  were:  Verbal 
Ability,  Reasoning,  Spatial  Ability,  Memory,  Eye-Limb 
Coordination,  Work  Orientation,  Interest  in  Using  Tools,  Interest 
in  Technical  Activities,  and  Interests  in  Leadership.  Thus,  the 
two  methods  produced  equations  that  stepped  down  from  26 
predictors  to  5-8  predictors  and  correlated  about  .92  to  .95  with 
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the  full  equation.  The  remarkable  observation  about  these 
results  is  the  consistency  of  obtained  validity  for  the  reduced 
equations  compared  to  the  full  equations  (the  first  column 
contains  the  full  equation  validity  corresponding  to  the  reduced 
equation).  The  difference  in  the  validities  is  never  more 
than  .02.  This  demonstrates  that  the  stepwise  method  is 
preserving  the  level  of  validity  in  the  original,  non-reduced 
equation. 

Use  of  only  ASVAB  predictors  produces  validities  for  Core 
Technical  Proficiency  that  are  equal  to  (or  .01  higher  than)  the 
validities  for  the  best  non-reduced  synthetic  equations.  This  is 
not  too  surprising  since  Project  A  results  have  already  shown 
that  the  new  predictors  developed  for  Project  A  provide  no 
incremental  validity  for  predicting  this  criterion.  No 
discriminant  validity  was  found  for  the  ASVAB-only  synthetic 
equations,  also  not  surprising  given  that  there  was  very  little 
discriminant  validity  (0.03)  for  the  least  squares  equations. 

With  regard  to  predicting  Overall  Performance,  the  ASVAB- 
only  equations  do  equally  well  or  .02  lower  when  compared  to  the 
non-reduced,  synthetic  equations.  The  Overall  Performance 
criterion  is  a  weighted  sum  of  all  five  Project  A  criterion 
constructs,  so  we  might  have  expected  a  bit  more  improvement  when 
all  the  predictors  were  included.  No  discriminant  validity  was 
obtained  here  either,  but  there  was  even  less  available  since  the 
discriminant  validity  for  the  least  squares  equations  was 
just  .01. 


Comparison  of  Synthetic  Validation  Model  to 
Validity  Generalization  Model 

The  synthetic  validation  model  is  one  method  of  developing  a 
prediction  equation  for  jobs  for  which  no  empirical  validity  data 
are  available,  for  whatever  reason.  Other  models  exist  for  this 
purpose,  notably  the  validity  generalization  or  validity 
transportability  model  (Schmidt  &  Hunter,  1981).  Very  briefly 
described,  in  this  model  "new"  jobs  or  jobs  for  which  appropriate 
empirical  data  do  not  exist  are  compared  to  "existing"  jobs  for 
which  such  data  do  exist.  If  a  match  is  made  between  a  new  job 
and  an  existing  job,  then  the  validity  evidence  for  the  existing 
job  is  deemed  to  be  relevant  for  the  new  job.  This  allows  the 
selection  methods  for  the  existing  job  to  be  used  for  the  new 
job.  Of  course,  new  jobs  need  not  be  matched  to  specific 
existing  jobs .  They  could  be  matched  to  clusters  of  existing 
jobs,  or,  in  the  extreme,  research  could  be  carried  out  to 
demonstrate  that  one  equation  could  serve  to  predict  performance 
for  all  jobs  in  an  organization  (or  however  the  population  of 
relevant  jobs  is  defined).  In  order  for  the  synthetic  validation 
model  to  receive  serious  consideration,  it  must  provide  validity 
results  at  least  comparable  to  those  provided  by  the  validity 
transportability  model. 
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There  is  not  universal  agreement  on  the  appropriate  data  or 
index  to  determine  the  degree  to  which  a  new  job  matches  an 
existing  job.  However,  the  completion  of  a  structured  task 
questionnaire,  such  as  the  Army  Task  Questionnaire,  by 
appropriate  samples  of  experts  on  the  target  jobs  appears  to  us 
to  provide  appropriate  data  for  matching  jobs.  Computation  of 
correlations  between  the  mean  task  profiles  for  target  jobs 
should  also  provide  an  appropriate  index  of  the  extent  to  which 
jobs  are  similar  in  terms  of  the  task  categories  that  must  be 
performed  on  the  job. 

Method  of  Comparison 

We  carried  out  a  comparison  of  the  transportability  and 
synthetic  models  as  described  below. 

For  these  analyses,  we  considered  9  of  the  18  MOS  in  the 
Project  A  data  base  to  be  the  "existing"  jobs  and  9  of  them  to  be 
the  "new"  jobs.  The  "existing"  jobs  were  the  Batch  A  MOS  for 
which  we  had  the  more  comprehensive  data  sets  and,  generally 
speaking,  larger  samples.  The  "new"  jobs  were  the  nine  Batch  Z 
jobs .  We  consider  this  to  be  an  extremely  powerful  simulation  of 
the  actual  applied  situation  that  the  Army  must  face. 

First,  we  computed  correlations  between  the  Army  Task 
Questionnaire  profiles  (on  mean  ratings  of  importance  for  Core 
Technical  Proficiency)  for  the  Batch  A  and  Batch  Z  MOS  in  order 
to  identify  the  Batch  A  MOS  (the  "existing"  job)  that  was  most 
similar  to  each  Batch  Z  MOS  (the  "new"  job).  We  defined  "most 
similar  to"  as  "most  highly  correlated  with"  for  these 
investigations .  We  also  identified  the  Core  Technical  cluster  to 
which  each  Batch  A  and  Batch  Z  MOS  belonged  (see  Figure  4.2). 

Second,  we  applied  the  empirical  least  squares  equation  for 
the  "existing"  job  that  most  closely  matched  each  "new"  job  to 
the  "new"  jobs  and  correlated  the  resulting  composite  scores  with 
Core  Technical  Proficiency.  This  provided  an  estimate  of  the 
absolute  transported  or  generalized  validity  for  each  new  job. 

We  also  computed  the  off-diagonal  validity  coefficients,  i.e., 
the  correlations  of  the  least  squares  equations  for  those 
"existing"  jobs  which  were  not  most  similar  to  the  new  jobs.  The 
difference  between  the  mean  absolute  validity  and  the  mean  off- 
diagonal  validity  provided  an  estimate  of  the  discriminant 
validity  for  the  method. 

Third,  we  developed  a  least  squares  empirical  equation  for 
each  of  the  four  Core  Technical  clusters  by  using  the  pooled 
predictor-Core  Technical  criterion  correlations  (pooled  across 
all  Batch  A  MOS  in  a  cluster)  together  with  the  predictor 
correlations  computed  across  the  entire  Project  A  Concurrent 
Validation  sample.  These  matrices  had  already  been  corrected  for 
range  restriction  due  to  selection  into  the  Army  and  the  MOS,  so 
it  was  appropriate  to  carry  out  the  pooling.  Table  4.12  shows 
the  weights  developed  for  the  four  clusters,  as  well  as  the 
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Table  4.12 


Normalized  Regression  Weights  for  the  Least  Squares  Equations 
Computed  for  the  Four  Batch  A  MOS  Clusters  and  the  General 
Batch  A  Group  (All  Nine  Batch  A  MOS) 


Cluster 

Predictor 

M 

A 

C 

E 

G 

Verb 

-.016 

.247 

.169 

.251 

.151 

Reas 

.196 

.232 

.140 

.049 

.172 

Numb 

.050 

.175 

.119 

.385 

.131 

Spat 

-  .043 

.041 

.078 

-.040 

.039 

InPr 

.032 

.040 

.014 

-.024 

.021 

PS&A 

.070 

.017 

.048 

-.016 

.045 

Mem 

.025 

-.006 

.022 

.008 

.015 

Mech 

.399 

-.022 

.093 

.172 

.139 

E  LC 

-.001 

-.036 

-.021 

.003 

-.019 

Prec 

-.017 

.016 

.019 

.056 

.011 

MJud 

-.006 

.016 

-.003 

-.094 

-.014 

Dext 

.020 

.029 

.048 

-.010 

.042 

At  hi 

-.054 

-.066 

-.021 

-.003 

-.043 

WkOr 

.013 

.090 

-.005 

.091 

.034 

Coop 

.030 

.017 

-.007 

-.094 

-.007 

Ener 

-.022 

-.078 

-.004 

.008 

-.017 

Cons 

.104 

.111 

.085 

.095 

.091 

Dom 

-.030 

-.038 

.045 

-.109 

-.004 

Tool 

.157 

-.025 

.026 

.118 

.063 

Rugd 

.062 

.064 

.090 

-.093 

.072 

Prot 

-.033 

.020 

.004 

.016 

-.003 

Tech 

-.073 

-.051 

-.000 

.108 

-.036 

Sci 

.000 

.035 

-.059 

.052 

-.038 

Lead 

.031 

.007 

.038 

-.065 

.039 

Art 

-.026 

.034 

-.015 

-.047 

.005 

Org 

-.037 

-.026 

-.048 

.032 

-.048 

Intercorrelations  of 

Least  Squares  Composites 

Created  for 

the  Four  Batch 

A  MOS 

Clusters 

and  the 

General 

Batch  A  Group 

M 

A 

C 

E 

G 

M 

1.000 

.806 

.906 

.794 

.932 

A 

.806 

1.000 

.946 

.874 

.953 

C 

.906 

.946 

1.000 

.856 

.993 

E 

.794 

.874 

.856 

1.000 

.878 

G 

.932 

.953 

.993 

.878 

1.000 

Note .  M  =  Mechanical/Construction  Cluster;  A  =  Administration/ 
Support  Cluster;  C  =  Combat  Cluster;  E  =  Electronics  Cluster; 

G  =  General  Cluster  formed  from  all  Batch  A  MOS 
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intercorrelations  of  the  predicted  scores  computed  from  the 
equations .  The  appropriate  cluster  equation  for  each  Batch  Z  MOS 
(i.e.,  the  equation  for  the  cluster  to  which  the  MOS  belonged) 
was  then  used  to  compute  a  composite  score  which  was  correlated 
with  Core  Technical  Proficiency  to  provide  an  estimate  of  the 
absolute  validity  for  this  method  for  each  Batch  Z  MOS.  The 
estimate  of  discriminant  validity  war  obtained  by  subtracting  the 
mean  off-diagonal  coefficient  from  the  mean  absolute  validity 
coefficient,  as  described  above. 

Fourth,  we  developed  a  least  squares  empirical  equation  for 
all  nine  Batch  A  MOS  by  pooling  across  all  nine  jobs.  The 
weights  for  this  equation  are  also  shown  in  Table  4.12.  This 
equation  was  applied  to  all  nine  Batch  Z  MOS  to  provide  an 
estimate  of  the  absolute  validity  for  a  "General"  model  of 
validity  transportability.  Of  course,  there  is  no  discriminant 
validity  possible  for  this  method  since  only  one  equation  is 
used. 


Fifth,  we  compared  the  validity  coefficients  (and  the 
absolute  and  discriminant  validities  derived  from  them)  obtained 
for  the  Batch  Z  MOS  when  these  validity  transportability  models 
are  used  with  those  obtained  when  each  of  the  Batch  Z  MOS  "own" 
empirical  least  squares  equation  is  used,  and  with  those  obtained 
when  the  various  forms  of  tne  synthetic  method  are  used. 

In  summary,  we  had  matched  each  of  nine  "new"  jobs  to  a 
single  "existing"  job  and  to  a  single  cluster  of  "existing"  jobs. 
These  matches  provided  two  least  squares  equations  to  apply  to 
each  new  job.  In  addition,  we  had  a  "general"  least  squares 
equation  that  we  could  apply  to  each  new  job.  These  three 
"existing"  jobs  equations  provide  three  different  forms  of  the 
transportability  model.  The  synthetic  model  was  represented  by 
the  various  forms  of  synthetic  equations  described  earlier. 
Finally,  since  these  nine  "new"  jobs  were,  in  actuality,  included 
in  empirical  validation  research  as  part  of  Project  A,  we  had 
available  an  estimate  of  validity  for  the  case  when  an  empirical 
study  could  be  carried  out  for  a  "new"  job  (i.e.,  its  "own" 
equation ) . 

Results 


Table  4.13  shows  the  correlations  between  Army  Task 
Questionnaire  profiles  for  the  Batch  A  and  Batch  Z  MOS.  The 
highest  correlation  in  each  column  represents  the  closest  match 
to  an  existing  job  (Batch  A  MOS)  for  a  new  job  (Eatch  Z  MOS). 

All  of  the  Batch  Z  MOS  appear  to  have  an  acceptably  high 
correlation  with  a  Batch  A  MOS  (>  .70),  indicating  a  close  match, 
except  for  51B  (.58  correlation).  Also,  most  of  the  matches  do 
not  have  close  rivals.  With  the  exception  of  27E,  there  are  no 
other  column  correlations  that  are  only  .02  lower;  usually  they 
are  at  least  .06  or  .07  lower.  Note  that  12B  and  16S  both  match 
most  closely  with  11B,  and  that  51B,  55B,  and  94B  all  match  most 
closely  with  88M. 
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Table  4.14  shows  the  validity  coefficients  produced  when  the 
empirical  least  squares  equations  for  the  Batch  A  MOS  are  applied 
to  the  Batch  Z  MOS.  These  are  cross-validity  coefficients  and 
require  no  shrinkage  adjustment.  The  highest  column  entries,  or 
Batch  Z  validities,  are  underlined  and  the  "most  similar" 
validity  coefficient  is  asterisked.  In  only  one  case  does  the 
highest  validity  for  a  Batch  Z  MOS  occur  for  the  "most  similar" 
Batch  A  equation,  for  27E.  Thus,  using  the  "closest  job  match" 
method  as  we  have  operationalized  it  would  not  produce  the 
highes4-  validities  possible  from  the  set  of  existing  jobs.  In 
general,  however,  the  method  does  produce  acceptably  high 
validi  ies;  the  mean  of  the  asterisked  validity  coefficients 
is  .67  with  a  standard  deviation  of  .10. 

While  not  directly  releva.it  to  the  primary  research  question 
addressed  in  this  section,  the  means  and  standard  deviations  of 
the  row  and  column  coefficients  do  provide  some  interesting 
information.  The  row  coefficients  provide  an  estimate  of  the 


Table  4.13. 

Correlations  Between  Army  Task  Questionnaire  Profiles  (Mean 
Importance  Ratings  for  Core  Technical  Proficiency)  for  Project  A 
Batch  A  and  Batch  Z  MOS  Included  in  the  Synthetic  Validity 
Project:  Highest  Column  Correlations  Underlined 


Batch  A  MOS 

Batch  z 

MOS 

12B 

16S 

27E 

51B 

54B 

55B 

67N 

76Y 

94B 

1  IB 

.  85 

.92 

.41 

.44 

.81 

.66 

.54 

.48 

.49 

13B 

.  65 

.76 

.45 

.52 

.87 

.67 

.63 

.53 

.54 

19K 

.66 

.80 

.52 

.33 

.72 

.49 

.58 

.40 

.38 

31C 

.59 

.75 

.71 

.44 

.79 

.53 

.71 

.57 

.54 

63B 

.67 

.78 

.69 

.53 

.79 

.69 

u_8i 

.60 

.63 

7 1L 

.54 

.62 

.45 

.35 

.67 

.68 

.57 

.82 

.67 

88M 

.73 

.84 

.55 

.58 

.83 

.79 

.76 

.68 

.71 

9 1A 

.45 

.55 

.40 

.30 

.61 

.55 

.53 

.59 

.60 

95B 

.77 

.85 

.41 

.42 

.80 

.66 

.66 

.57 

.53 

4  —  31 


Table  4.14 


Validity  Coefficients  of  Least  Squares  Equations  for  Predicting 
Core  Technical  Proficiency,  When  Developed  on  Project  A  Batch  A 
MOS  and  Applied  to  Project  A  Batch  Z  MOS:  Highest  Column  Entries 
Underlined 


Equation 

from 

Batch  A  MOS 

Applied  to 

Batch 

Z  MOS 

12B 

16S 

27E 

51B 

54B 

55B 

67N 

76Y 

94B 

Mean 

S.D. 

11B 

.64* 

.50* 

.70 

.88 

.71 

.67 

.77 

.56 

.65 

.68 

.10 

13B 

.62 

.50 

.65 

.97 

.70* 

.70 

.78 

.43 

.59 

.66 

.15 

19K 

.63 

.50 

.  70 

.83 

■  72 

.58 

.83 

.56 

.65 

.67 

.11 

31C 

.59 

.  46 

.74* 

.86 

.72 

.66 

.74 

.64 

.66 

.67 

.11 

63B 

.55 

.31 

.61 

.80 

.62 

.62 

.76* 

.38 

.47 

.57 

.15 

71L 

.45 

.50 

.54 

.76 

.63 

.47 

.59 

.59* 

.70 

.58 

.10 

88M 

.64 

.45 

.65 

.84* 

.72 

.62* 

.80 

.55 

.63* 

.66 

.11 

91A 

.59 

.48 

.67 

.89 

.72 

.64 

.84 

.54 

.64 

.67 

.13 

95B 

.60 

.53 

.66 

.82 

.72 

.57 

.75 

.61 

.68 

.66 

.09 

Mean 

.59 

.47 

.66 

.85 

.70 

.61 

.76 

.54 

.63 

S.D. 

.06 

.06 

.05 

.06 

.04 

.06 

.07 

.08 

.06 

♦Validity  coefficient  for  Batch  Z  MOS  using  the  equation  developed  on  Batch  A 
MOS  that  is  most  similar  in  terms  of  ATQ  Profile  correlation,  Mean  =  .67, 

S.D.  =  .10. 


general  validity  for  a  particular  Batch  A  MOS  (existing  job) 
equation.  An  equation  with  a  relatively  high  mean  and  low 
standard  deviation  provides  generally  high  validity  across  all 
new  jobs,  while  an  equation  with  a  relatively  low  mean  and  high 
standard  deviation  provides  generally  low  validity  that  varies 
across  new  jobs.  There  is  some  difference  in  means  (they  range 
from  .57  for  63B  to  .68  for  11B),  but  the  standard  deviations  are 
similar.  The  means  and  standard  deviations  of  the  column 
coefficients  provide  information  about  the  relative 
predictability  of  the  Batch  Z  MOS  (new  jobs).  51B  and  67N  appear 
to  be  the  most  predictable,  while  16S  appears  to  be  the  least 
predictable . 
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Table  4.15  presents  the  validity  coefficients  when  the 
"General"  and  cluster  equations  are  applied  to  the  Batch  Z  MOS. 
The  "appropriate"  cluster  coefficients  (i.e.,  those  found  for  the 
cluster  to  which  the  Batch  Z  MOS  belongs)  are  underlined.  The 
appropriate  coefficients  are  the  highest  of  the  cluster 
coefficients  for  five  of  the  nine  MOS  (12B,  54B,  67N/  76Y,  and 
94B).  The  average  appropriate  validity  coefficient  was  .68  with 
a  standard  deviation  of  .08,  which  are  the  same  as  the  average 
and  standard  deviation  for  the  "General"  equation. 

Table  4.16  presents  the  validity  coefficients  for  each  Batch 
Z  MOS  or  new  job  for  its  "own"  empirical  equation  (the  validity 
coefficient  obtained  if  the  validity  research  could  actually  be 
carried  out,  as  it  was  for  these  jobs),  the  Batch  A  "MOS  Match" 
equation,  the  appropriate  Batch  A  cluster  equation,  the  Batch  A 
General  equation,  and  the  eight  forms  of  the  synthetic  validity 
equations.  Examination  of  the  row  of  mean  validity  coefficients 
in  this  table  shows  that  the  General  and  cluster  equations 
provide  the  highest  average  validity  (0.68),  other  than  the  "own" 
equation,  followed  by  the  Batch  A  "MOS-Match"  equation  (.67). 

This  is  closely  followed  by  the  synthetic  equations  that  combine 
0-1  or  0-mean  attribute  weights  with  MOS  mean  component  weights 
(.66)  and  the  synthetic  equations  that  combine  0-1  or  0-mean 
attribute  weights  with  threshold  component  weights  (.65).  There 
is  virtually  no  difference  in  standard  deviations  of  the  validity 
coefficients;  they  range  from  .08  to  .10.  Thus,  all  of  the 
transportability  methods  provide  high  absolute  validities  as  do 
several  of  the  synthetic  methods,  but  only  Batch  A  cluster  method 
provides  absolute  validity  as  high  as  the  General  method. 

Table  4.17  shows  the  absolute  and  discriminant  validity 
coefficients  for  all  the  methods.  Note  that  the  discriminant 
validity  for  the  "own"  equation  is  .05,  which  we  have  regarded  as 
the  theoretical  upper  limit  for  this  type  of  validity.  How¬ 
ever,  the  Batch  A  "MOS-Match"  or  Batch  A  Cluster  discriminant 
validity  probably  more  nearly  provides  the  theoretical  upper 
limit  for  the  applied  situation  for  which  the  transportability 
and  synthetic  models  are  intended,  that  is,  applying  information 
from  existing  jobs  to  new  jobs  for  which  "own"  equations  are  not 
available.  These  two  discriminant  validity  values  are  .03 
and  .01. 

Examination  of  Table  4.17  shows  that  the  Batch  A  "MOS  Match" 
method  achieves  an  appreciable  level  of  discriminant  validity 
(.03).  Discriminant  validities  for  the  other  transportability 
and  synthetic  methods  range  from  -.02  to  .02.  Several  methods 
achieve  acceptably  high  levels  of  absolute  validity.  The  General 
and  Batch  A  cluster  least  squares  equations  achieve  the  highest 
level  of  absolute  validity.  The  General  method  requires  the 
collection  of  no  additional  data  about  new  jobs,  as  do  all  of  the 
other  methods,  albeit  the  additional  data  required  of  other 
methods  is  not  tremendously  costly  (completion  of  Army  Task 
Questionnaires  by  15  to  30  SMEs). 
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Table  4 . 15 


Validity  Coefficients  of  General  and  Cluster  Least  Squares  Equations 
for  Predicting  Core  Technical  Proficiency,  Developed  on  Batch  A  MOS 
and  Applied  to  Batch  Z  MOS1 


Batch  Z 
MOS 

Validity  Coefficients 

For: 

General 

Equation 

Mechanical 

Equation 

Administrative 

Equation 

Combat 

Equation 

Electronics 

Equation 

12B 

.65 

.64 

.60 

.64 

.61 

16S 

.51 

.41 

.54 

.52 

.48 

27E 

.74 

.66 

.71 

.73 

.71 

51B 

.87 

.79 

VO 

00 

• 

.88 

.78 

54B 

.74 

.68 

.72 

.73 

.70 

55B 

.69 

.67 

.64 

.69 

.64 

67N 

.78 

.77 

.73 

.77 

.70 

76Y 

.61 

.52 

.63 

.59 

.62 

94B 

.68 

.57 

.72 

.67 

.65 

Mean 

.68 

.63 

.68 

.69 

.65 

S.D. 

.08 

.11 

.09 

.10 

.08 

Mean  of  appropriate  cluster  coefficients  (underlined)  =  .68 
S.D.  of  appropriate  cluster  coefficients  (underlined)  =  .08 


Note .  The  correlations  of  the  26  attributes  with  Core  Technical 
Proficiency  for  the  four  clusters  (M,  A,  C,  E)  were  estimated  by  the 
pooled  correlations  of  the  Batch  A  MOS  in  each  cluster  and,  for  the 
General  group,  by  pooling  all  of  the  Batch  A  correlations. 

Underlined  coefficients  indicate  the  appropriate  cluster  for  each 
MOS . 
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Table  4.17 


Absolute  and  Discriminant  Validity  Coefficients  for  Predicting 
Core  Technical  Proficiency  (Computed  Across  Nine  Batch  Z  MOS)  for 
Equations  Developed  from  Various  Methods 


Equation 

Absolute 

Validity 

Discriminant 

Validity 

"Own"  Least  Squares 

.  701 

.05 

Batch  A  "MOS-Match"  Least  Squares 

.67 

.03 

Batch  A  Cluster  Least  Squares 

.68 

.01 

Batch  A  General  Least  Squares 

.68 

.00 

Full  Synthetic  (Mean  Attribute 
Validities  and  MOS  Mean 

Component  Weights) 

.56 

-.01 

Top  5  Stepwise  Reduction 

.58 

-.02 

0-1  Attribute  Weights 

.66 

.00 

O-Mean  Attribute  Weights 

.66 

.00 

Threshold  Component  Weights 

.58 

.00 

0-1  Attribute  Weights  and 

Threshold  Component  Weights 

.65 

.01 

0-Mean  Attribute  Weights  and 
Threshold  Component  Weights 

.65 

.02 

*The  absolute  validities  for  "own"  least  squares  equations  were 
computed  on  coefficients  adjusted  with  Rozeboom's  equation  #8 
(1978).  Other  absolute  validities  were  computed  on  coefficients 
that  did  not  require  adjustments. 


Conclusions:  Validity  of  Synthetic  Validity  Models 

We  have  attempted  to  identify  important  points  and 
conclusions  throughout  this  chapter.  In  this  section  we  discuss 
the  most  salient  conclusions. 

First,  and  most  importantly,  synthetic  validity  methods  in 
almost  any  form  provide  acceptably  high  levels  of  absolute 
validity  for  Core  Technical  and  Overall  Performance.  The  highest 
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validities  are  achieved  for  predicting  Core  Technical  Proficiency 
in  which  the  attribute-by-component  weights  are  formed  by  giving 
zero  weight  for  cells  with  lower  estimates  of  validity  (validity 
coefficient  =  .64);  the  lowest  achieved  validities  are  .55  (see 
Table  4.11).  These  values  compare  favorably  to  validities 
achieved  by  using  least  squares  equations  developed  on  the  MOS 
themselves,  which  are  about  .67  for  Core  Technical  Proficiency. 

Synthetic  validity  methods  show  very  little  discriminant 
validity,  about  .01  or  .02  for  the  Core  Technical  criterion  and 
zero  or  .01  for  the  Overall  criterion.  However,  there  appears  to 
be  no  more  than  .06  discriminant  validity  available  for  the  Core 
Technical  criterion  (see  Table  4.11).  The  comparable  value  for 
the  Overall  criterion  is  .01.  With  regard  to  level  of 
discriminant  validity  for  these  data,  A.  Schwartz  (personal 
communication,  August  8,  1990)  has  completed  parallel  analyses  to 
those  completed  on  this  project,  using  the  ASVAB  Aptitude  Area 
Composites  as  the  prediction  equations.  Figure  4.6  shows  10 
Aptitude  Area  Composites ,  the  ASVAB  subtests  making  up  each 
composite,  and  the  MOS  in  the  synthetic  validity  project  to  which 
the  composites  are  applied.  Schwartz  computed  the  absolute  and 
discriminant  validities  of  these  Aptitude  Area  composites  for 
predicting  Core  Technical  Proficiency  and  obtained  values  of  .65 
(absolute  validity)  and  .02  (discriminant  validity).  Note  that 
the  discriminant  validity  value  is  the  same  as  that  achieved  by 
the  best  synthetic  equation,  and  the  absolute  validity  falls 
midway  between  the  synthetic  value  (.64)  and  the  least  squares 
value  for  the  ASVAB  reduction  (.66). 

In  summary,  it  appears  that  synthetic  validity  methods  can 
achieve  somewhere  in  the  neighborhood  of  96%  or  greater  of  the 
absolute  validity  achieved  by  least  squares  equations  developed 
on  the  MOS  themselves  (.64  divided  by  .67),  and  about  33%  of  the 
discriminant  validity  obtained  by  the  least  squares  method  (.02 
divided  by  .06). 

The  most  important  comparison,  in  terms  of  the  operational 
viability  of  the  synthetic  methods  for  the  Army,  is  that  of 
comparing  transportability  methods  to  synthetic  methods,  since 
one  of  these  two  types  of  methods  must  be  used  to  develop  an 
equation  for  a  new  MOS  or  an  existing  MOS  for  which  empirical 
validation  research  cannot  be  completed."  The  analyses  addressing 
this  comparison  (see  Table  4.17)  show  that  the  transportability 
methods  produce  absolute  and  discriminant  validities  that  are  as 
high  or  higher  than  the  synthetic  methods .  The  Batch  A  Cluster 
and  Batch  A  General  least  squares  methods  achieved  the  highest 
absolute  validity  (.68).  The  Batch  A  cluster  method  has 
discriminant  validity  of  .01,  while  there  is  zero  discriminant 
validity  for  the  General  method,  which  uses  a  single  least 
squares  equation  developed  across  nine  Army  MOS.  The  Batch  A 
"MOS-Match"  method  achieved  absolute  validity  of  .67,  only 
slightly  lower  than  the  Cluster  and  General  methods,  and  achieved 
the  highest  discriminant  validity  (.03)  of  any  synthetic  or 
transportability  method  investigated.  The  choice  between  the 
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Batch  A  Cluster  and  MOS -Match  methods  hinges  on  the  judgment  of 
whether  it  is  better  to  have  .01  greater  absolute  validity  at  the 
cost  of  .02  in  discriminant  validity.  Use  of  either  method 
assumes  the  appropriate  cluster  or  MOS  match  for  a  new  MOS  can  be 
identified.  The  method  used  in  this  project — obtaining  Army  Task 
Questionnaire  profiles  for  new  MOS  and  correlating  them  with 
profiles  for  "existing"  MOS — could  provide  this  information. 


ASVAB  Subtests  and  Abbreviations 


Arithmetic  Reasoning  (AR) 

Auto  and  Shop  Information  (AS) 
Coding  Speed  (CS) 

Electronics  Information  (El) 
General  Science  (GS) 

Mechanical  Comprehension  (MC) 
Mathematics  Knowledge  (MK) 
Numerical  Operations  (NO) 
Verbal  (VE) 


ASVAB  AA  Composites 
and  Abbreviations 


Clerical  (CL) 

Combat  (CO) 

Electronics  Repair  (EL) 

Field  Artillery  (FA) 

General  Maintenance  (GM) 

General  Technical  (GT) 
Mechanical  Maintenance  (MM) 
Operators /Food  (OF) 

Surveillance/Communication  (SC) 
Skilled  Technical  (ST) 


MOS  Chosen 

ASVAB  Subtests  with 

in  AA  Composites  AA  Composites 


AR 

+ 

MK 

+ 

VE 

71L, 

76Y 

AR 

+ 

AS 

+ 

CS 

+ 

MC 

11B, 

12B, 

19K 

AR 

+ 

El 

+ 

GS 

+ 

MK 

27E 

AR 

+ 

CS 

+ 

MC 

+ 

MK 

13B 

AS 

+ 

El 

+ 

GS 

+ 

MK 

51B, 

55B 

AR 

+ 

VE 

— 

AS 

+ 

El 

+ 

MC 

+ 

NO 

63B, 

67N 

AS 

+ 

MC 

+ 

NO 

+ 

VE 

16S, 

88M, 

94B 

AR 

+ 

AS 

+ 

MC 

+ 

VE 

31C 

GS 

+ 

MC 

+ 

MK 

+ 

VE 

54B, 

91A, 

95B 

Figure  4.6.  ASVAB  subtests,  aptitude  area  (AA)  composites,  and 
synthetic  validity  project  MOS  chosen  with  AA  composites. 
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Chapter  5.  Standard  Setting  Instruments 

Lauress  L.  Wise  (AIR),  R.  Gene  Hoffman  (HumRRO), 
Wei  Jing  Chia  (AIR),  and  Carolyn  Hill  Fotouhi  (HumRRO) 


Phase  II  results  led  to  several  modifications  of  the 
standard  setting  procedures .  The  Soldier-Based  Standard  Setting 
instrument  was  dropped  from  consideration.  The  Critical  Incident 
scale  was  renamed,  and  its  basis  for  task  dimensions  shifted  to 
reflect  Army  Task  Questionnaire  content.  The  Task-Based  Standard 
Setting  Questionnaire  was  extensively  overhauled,  changing  its 
content  to  conform  to  the  Army  Task  Questionnaire,  modifying  the 
information  presented  and  the  method  of  presentation,  and 
altering  the  response  format. 

The  standard  setting  instruments  are  designed  to  obtain 
standards,  not  on  whole  jobs,  but  on  components  of  the  job.  For 
the  Phase  II  instruments,  those  components  were  taken  from  the 
Hybrid  Questionnaire.  Phase  II  results  led  to  the  adoption  of 
the  Army  Task  Questionnaire  as  the  recommended  job  description 
instrument.  Thus,  for  Phase  III,  new  standard  setting  dimensions 
were  required.  In  an  attempt  to  obtain  some  level  of 
generalizability.  Army  Task  Questionnaire  lettered  dimensions 
(see  Figure  3.1)  were  used  to  define  job  components  rather  than 
the  separate  task  categories .  After  reviewing  the  MOS  to  be 
included  in  Phase  III,  six  task  dimensions  were  identified  as 
appropriate  for  the  standard  setting  exercises.  A  constraint  on 
dimension  selection  was  that  Project  A  criterion  data  had  to  be 
available  for  every  dimension  that  was  used.  The  task  dimensions 
and  MOS  to  which  they  were  assigned  are  presented  in  Table  5.1. 
Task  dimensions  included  one  that  was  common  to  all  MOS 
(Individual  Combat).  The  other  dimensions  attempted  to  capture 
one  or  two  other  components  of  the  MOS  that  are  more  specific  to 
the  content  of  that  MOS . 

Both  Phase  III  standard  setting  instruments  are  designed  to 
establish  performance  standards  that  differentiate  four  levels  of 
job  performance.  These  levels,  identified  early  in  the  project, 
include  Unacceptable  performance.  Marginal  performance. 

Acceptable  performance,  and  Outstanding  performance.  Figure  5.1 
presents  their  definitions.  Notice  that* the  performance  levels 
are  defined  in  terms  of  the  behavior  of  the  soldiers  and 
consequences  to  the  Army. 

In  the  literature  cited  in  Chapter  1  and  reviewed  for  this 
project  by  Pulakos,  Wise,  Arabian,  Heon  and  Delaplane  (1989) 
standard  setting  procedures  are  typically  applied  to  a  particular 
test,  and  standards  for  that  test  are  the  desired  end-product. 

In  the  context  of  synthetic  validation,  standard  setting 
procedures  have  a  much  broader  focus  and  are  further  removed  from 
the  eventual  end-product.  There  is  no  particular  performance 
test  that  is  the  focus  of  attention.  Rather,  the  ultimate  focus 
is  on  setting  cutoffs  on  predictor  tests.  Setting  job  component 
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Table  5 . 1 

Performance  Dimensions  for  Phase  III  Standard  Setting 

_ MOS _ 

Dimension  12B  13B  27E  29E  31C  31D  51B  54B  55B  95B  96B 


B.  Electrical  &  X  X  X  X 

Electronic  Main't 

D.  Vehicle  &  Equipt .  XX  XXX 

Operations 

H.  Clerical  X  XXX 

I .  Communications  XXX  XX 

M.  Individual  Combat  XXXXXXXXXXX 


N.  Crew-served 
Weapons 


Unacceptable  s 


Marginal : 


Acceptable : 


Outstanding: 


Soldiers  who  consistently  perform  like  this 
should  not  have  been  selected  for  this  MOS. 
Their  performance  is  hurting  the  Army. 
Additional  training  would  not  bring  their 
performance  up  to  acceptable  levels. 

Soldiers  who  consistently  perform  like  this 
need  extra  or  remedial  training.  Their 
current  performance  is  of  little  or  no 
benefit  to  the  Army. 

Soldiers  who  consistently  perform  like  this 
are  doing  an  adequate  job.  They  are  making 
positive  contributions  to  the  Army. 

Soldiers  who  consistently  perform  like  this 
are  doing  extremely  well.  They  are  making 
exceptional  contributions  to  the  Army  and  are 
good  examples  to  other  soldiers . 


Figure  5.1.  Performance  level  definitions  for  standard  setting 
exercises . 
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standards  is  just  a  step  in  that  direction.  Linkage  of  the  job 
component  standards  to  predictor  standards  requires  that  the 
component  standard  be  expressed  in  distribution  terms  rather  than 
in  performance  score  terms.  (See  Chapter  6.)  Thus,  the  three 
cutoff  points  dividing  the  four  levels  of  performance  are  most 
conveniently  expressed  as  percentile  scores.  For  both  the 
Behavioral  Incident  and  Task-Based  Standard  Setting  instruments, 
SME  ratings  are  expressed  as  percentiles  derived  from  the 
distributions  of  soldier  performance  obtained  in  Project  A 
Concurrent  Validation.  Thus,  the  Marginal  cutoff  score  is  the 
minimum  performance,  expressed  as  a  percentile  score,  needed  to 
be  classified  as  at  least  Marginal.  All  scores  below  that  cutoff 
are  less  than  Marginal;  all  scores  above  that  cutoff  are  at  least 
Marginal.  Analogous  interpretations  hold  for  the  other  cutoff 
levels . 

The  remainder  of  this  chapter  will  describe  the  revised 
instruments,  present  their  reliability  estimates,  and  discuss  the 
standards  obtained  from  them.  Chapter  6  of  this  report  describes 
the  linkage  of  these  performance  standards  to  predictor  battery 
standards . 

Behavioral  Incident  Standard  Setting  Questionnaire 

The  Behavioral  Incident  Standard  Setting  Questionnaire 
requires  respondents  to  rate  samples  of  job  performance  as  either 
Unacceptable,  Marginal,  Acceptable,  or  Outstanding.  The  most 
obvious  change  to  this  instrument  from  Phase  II  is  a  change  in 
its  name.  This  standard  setting  instrument  was  originally  called 
the  Critical  Incident  Standard  Setting  Questionnaire  because  it 
was  developed  from  Project  A  critical  incidents  used  in 
constructing  MOS-Specific  Behaviorally  Anchored  Rating  Scales 
(Mos-Specif ic  BARS).  However,  for  use  in  standard  setting,  the 
term  "critical"  is  inappropriate.  The  incidents  are  meant  to  be 
examples  of  performance  anywhere  along  the  continuum  from  poor  to 
outstanding.  In  Phase  II,  raters  had  a  tendency  to  focus  on  each 
individual  incident  and  as  a  result  had  a  problem  comparing  the 
incidents  to  the  performance  level  definitions  (e.g.,  a  single 
incident  of  poor  judgment  does  not  make  a  soldier  unacceptable). 
Therefore,  the  title  of  the  instrument,  as  well  as  the 
instructions,  were  changed  to  emphasize -that  the  each  incident 
should  be  judged  as  a  representative  sample  of  a  pattern  of 
behavior. 

Separate  Behavioral  Incident  scales  were  constructed  for  the 
task  dimensions  identified  in  Table  5.1.  A  sample  scale  may  be 
found  in  Appendix  A,  Attachment  2 .  The  complete  set  of  scales  is 
in  Volume  II.  These  six  dimensions  are  defined  by  a  varying 
number  of  somewhat  broad  task  categories  on  the  Army  Task 
Questionnaire.  (See  Chapter  3  of  this  report  for  a  complete 
description  of  the  Army  Task  Questionnaire.)  For  example, 
Dimension  B  (Electrical  and  Electronic  Systems  Maintenance)  is 
defined  by  5  task  categories  (categories  7  through  11);  whereas 
Dimension  M  (Individual  Combat)  is  described  by  10  task 
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categories  (categories  71  through  80).  In  selecting  incidents 
for  each  scale,  we  examined  the  critical  incidents  that  were  used 
to  construct  the  Project  A  MOS-Specific  BARS  for  the  nine  Batch  A 
MGS.  Five  factors  were  considered  in  selecting  items  for  the 
Behavioral  Incident  scales . 

1.  Comprehensive  coverage  of  the  dimension.  To  ensure 
adequate  coverage  of  all  task  categories  within  a  dimension,  an 
equal  number  of  incidents  was  selected  for  each  task  category. 
Dimension  B,  for  example,  consists  of  five  task  categories; 
therefore,  four  incidents  were  selected  to  represent  each  task 
category.  For  Dimension  M,  which  consists  of  10  task 
categories,  two  incidents  were  selected  to  depict  each  category. 

2.  Full  range  of  performance  effectiveness.  Using  the  mean 
effectiveness  ratings  obtained  during  Project  A  BARS  development, 
incidents  were  selected  to  represent  the  full  range  of 
performance  effectiveness.  Although  it  was  difficult  to  find 
incidents  at  the  midpoint  of  the  scale  (i.e.,  means  of  4.0  to 
6.0),  several  incidents  were  available  for  selection  from  the 
extremes  of  the  scale  (i.e.,  means  of  1.0  to  3.0  and  7.0  to  9.0). 

3.  High  rater  agreement  of  performance  effectiveness.  In 
addition  to  effectiveness  scale  values,  we  examined  interrater 
agreement.  Specifically,  effectiveness  scale  value  standard 
deviations  were  examined.  Incidents  with  standard  deviations 
greater  than  2.0  were  avoided;  however,  five  incidents  with 
standard  deviations  of  2.0  or  greater  were  inadvertently  included 
on  three  different  scales. 

4.  Representative  of  a  variety  of  MOS.  Where  possible, 
incidents  were  sampled  from  as  many  Batch  A  MOS  as  possible.  The 
composition  of  the  dimension  controlled  this  to  some  extent.  For 
example.  Dimensions  H  (Clerical)  and  I  (Communication)  yielded  a 
sample  of  incidents  from  almost  all  nine  Batch  A  MOS.  Dimension 
N  (Crew-Served  Weapons),  on  the.  other  hand,  was  by  definition 
limited  to  a  sample  of  13B  and  19K  incidents. 

5.  Avoidance  of  disciplinary  incidents.  Because  the  goal 
of  the  standard  setting  exercises  is  to  identify  levels  of 
performance  on  areas  of  job  performance,,  incidents  depicting 
disciplinary  actions  were  not  selected. 

Qualitative  Feedback 

During  the  Phase  III  workshops,  subject  matter  experts 
(SMEs)  raised  several  issues  regarding  practical  aspects  of 
administering  both  the  Behavioral  Incident  and  Task-Based 
standard  setting  exercises  as  well  as  comments  about  the 
instruments  themselves.  Most  of  the  direct  comments  from  SMEs 
regarding  the  Behavioral  Incident  exercise  focused  on  suggested 
revisions  to  the  current  form.  Most  raters  wanted  more 
information  about  the  soldier  depicted  in  the  incident. 
Specifically,  they  wanted  to  replace  a  single  incident  with  a 
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short  history  of  the  soldier's  performance.  Because  SMEs 
understood  that  according  to  the  instructions  they  were  to 
generalize  from  a  single  incident  to  a  soldier  who  consistently 
performs  in  the  manner  described,  the  desire  for  more  information 
does  not  appear  to  be  due  to  a  misunderstanding  of  the  task  they 
were  to  perform.  The  request  for  additional  information  seemed 
to  stem  from  a  desire  to  give  Marginal  and  particularly 
Unacceptable  soldiers  "the  benefit  of  the  doubt"  or  "a  break." 
That  is,  SMEs  seemed  to  be  looking  for  some  redeeming  qualities 
of  the  soldier  in  every  incident.  Given  that  effectiveness 
ratings  are  available  on  single  incidents,  extending  those 
incidents  to  short  histories  is  not  a  viable  alternative. 

However,  it  may  be  possible  to  modify  the  instructions  to  provide 
a  better  frame  of  reference  to  the  SMEs.  During  Phase  III,  the 
question  raters  were  to  answer  for  each  incident  was  "If  a 
soldier  CONSISTENTLY  performed  duties  in  this  area  at  a  level  of 
effectiveness  like  the  example  incident,  what  kind  of  soldier 
would  this  be  (Unacceptable,  Marginal,  Acceptable,  or 
Outstanding)?"  The  "consistency"  wording  could  be  suffixed  with 
"In  other 'words,  you  wouldn't  be  surprised  if  an  Unacceptable, 
Marginal,  Acceptable,  or  Outstanding  soldier  performed  in  the 
manner  described  by  the  incident."  In  some  of  the  Phase  III 
workshops,  the  "you  wouldn't  be  surprised"  explanation  was  used 
to  augment  the  original  instructions,  and  it  seemed  to  facilitate 
explanation  of  the  standard  setting  task. 

A  second  suggested  revision  targeted  the  incidents 
describing  fatalities.  Dimension  M  (Individual  Combat)  included 
two  incidents  in  which  the  soldier  was  killed  as  a  result  of  his 
actions.  Item  4  (see  Appendix  A,  Attachment  2)  depicts  a  soldier 
performing  a  heroic  act  which  results  in  his  death.  Item  11 
recounts  the  death  of  a  soldier  and  his  NCOIC  as  a  result  of  the 
soldier's  careless  behavior.  In  both  cases,  the  items  are 
unanswerable  given  the  "consistently"  wording  of  the 
instructions.  Secondly,  the  SMEs  pointed  out  that  item  4 
actually  has  two  outcomes.  The  heroic  act  saved  several  lives, 
which  is  an  example  of  Outstanding  performance.  However,  the 
soldier  died,  and  this  is  Unacceptable  performance.  If  the 
Behavioral  Incident  exercise  is  to  be  used  in  future  standard 
setting  workshops,  incidents  describing  fatalities  should  be 
replaced,  and  all  incidents  should  be  reviewed  to  ensure  that 
they  portray  a  single  outcome. 

Aside  from  specific  comments  made  by  workshop  participants, 
workshop  leaders  made  some  general  observations  about  the 
Behavioral  Incident  exercise.  For  the  most  part,  SMEs  do  not 
feel  that  they  are  setting  performance  standards  with  this 
format.  Rather,  they  feel  that  they  are  making  decisions  about 
an  individual  soldier.  An  underlying  assumption  of  the  examinee- 
based  standard  setting  procedures  is  that  raters  are  more 
accustomed  to  making  decisions  about  individuals  than  about 
items;  therefore,  standard  setting  procedures  should  tap  that 
strength  by  having  raters  make  decisions  about  individuals.  On 
the  other  hand,  Hambleton  (1978)  emphasizes  the  importance  of 
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raters  clearly  understanding  the  task  they  are  to  perform  and  how 
the  final  standard  will  be  determined.  If  the  Behavioral 
Incident  exercise  is  used  as  an  operational  standard  setting 
pr  cedure,  the  instructions  should  be  expanded  to  include  the 
manner  in  which  the  data  will  be  used  to  determine  the  final 
standard . 

A  second  observation  is  that  SMEs  sometimes  have  difficulty 
determining  whether  they  should  substitute  an  MOS-specific 
incident  for  the  provided  incident  or  use  the  Cannot  Rate  option. 
For  example,  a  24  month  3 1C  may  make  minor  repairs  to  a 
generator,  but  he  or  she  does  not  use  STE-ICE  as  illustrated  in 
the  incident.  Should  the  soldier  described  in  this  incident  be 
rated  Outstanding  because  using  STE-ICE  is  above  and  beyond  what 
is  expected,  or  should  the  incident  receive  a  Cannot  Rate  because 
31C  technical  equipment  cannot  be  substituted  for  STE-ICE? 
Frequent  selection  of  the  Cannot  Rate  option  may  present  a 
scoring  problem,  especially  for  incidents  with  mean  effectiveness 
ratings  around  the  midpoint  of  the  effectiveness  scale.  As 
mentioned  earlier,  a  greater  number  of  incidents  were  available 
for  inclusion  on  the  Behavioral  Incident  instruments  at  the 
extremes  of  the  effectiveness  scale  than  were  available  around 
the  midpoint.  Thus,  if  several  incidents  around  the 
effectiveness  scale  midpoint  are  scored  Cannot  Rate,  there  will 
be  fewer  scale  values  available  for  setting  standards  at  that 
point  on  the  scale.  Reducing  the  number  of  scale  values  at  the 
midpoint  by  frequent  use  of  the  Cannot  Rate  option  may 
inadvertently  lead  to  more  stringent  or  more  lenient  standards 
than  SMEs  intended. 

Finally,  workshop  leaders  observed  that  some  SMEs  make 
erroneous  assumntions  about  technical  equipment  mentioned  in  the 
incident.  For  example,  an  SME  may  not  fully  understand  how  the 
equipment  described  operates .  In  trying  to  substitute  a 
comparable  piece  of  technical  equipment,  the  SME  may  assume  that 
the  equipment  mentioned  in  the  incident  is  considerably  more  or 
less  complicated  than  it  actually  is.  Thus,  he  may  substitute  a 
more  or  less  complex  piece  of  equipment.  The  effects  of  these 
substitution  mistakes  on  performance  standards  remain  unknown. 

Data  Editing  and  Scoring  Procedures 

A  two-stage  process  was  used  in  creating  scores  for  each 
Behavioral  Incident  dimension  rated  by  each  judge.  In  the  first 
stage,  the  effectiveness  scale  value  for  each  behavioral  incident 
was  converted  to  a  percentile  score.  This  process,  which 
involved  use  of  incident  effectiveness  scale  values  from  Project 
A  retranslation  workshops  (Toquam  et  al.,  1988)  and  incumbent 
ratings  from  the  Project  A  Concurrent  Validation  (Young,  Houston, 
Harris,  Hoffman,  &  Wise,  1990),  is  described  below.  In  the 
second  stage,  percentile  cut  scores  were  generated  from  the 
performance  level  ratings  of  the  incidents  for  the  dimension. 

Three  cut  scores  were  computed,  defining  the  minimal  performance 
levels  for  Marginal,  Acceptable,  and  Outstanding  performance, 
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respectively.  As  described  below,  two  different  methods  for 
defining  the  cut  scores  were  examined.  Scores  for  some 
combinations  of  judges  and  dimensions  were  dropped  at  this  stage 
due  to  missing  or  inconsistent  data. 

Conversion  to  Percentile  Scores .  Effectiveness  scale  values 
for  each  incident  were  collected  in  Project  A  retranslation 
workshops  using  a  9-point  scale,  with  1  representing  extremely 
ineffective  and  9  representing  extremely  effective  performance. 
Selected  incidents  were  then  used  as  anchors  for  the  MOS-Specific 
BARS.  The  BARS  used  a  7-point  scale.  The  first  step  in  creating 
percentile  scores  was  to  translate  the  effectiveness  scale  values 
to  the  7-point  scale  using  the  same  translation  that  had  been 
used  for  the  anchor  incidents.  The  translation  used  was  B  =  .25 
+  . 75*R,  where  B  was  the  BARS  effectiveness  level  and  R  was  the 
retranslation  effectiveness  level  on  the  original  9-point  scale. 

During  Project  A  development,  each  incident  was  sorted  into 
a  particular  retranslation  dimension.  (It  should  be  noted  that 
Project  A  retranslation  dimensions,  MOS-Specific  BARS  dimensions, 
and  Behavioral  Incident  task  dimensions  are  not  defined  in  the 
same  manner.  Upon  careful  review,  it  can  be  seen  that  the 
retranslation  dimensions  and  BARS  dimensions  are  closely  related; 
while  the  task  dimensions  are  clearly  unique.)  In  a  few  cases, 
retranslation  dimensions  were  combined  in  forming  the  MOS- 
Specific  BARS.  For  each  incident  in  the  Behavioral  Incident 
Questionnaire,  we  converted  effectiveness  scale  values  into 
performance  percentiles  by  computing  the  percent  of  incumbents  in 
the  Project  A  Concurrent  Validation  who  had  a  mean  (combined  peer 
and  supervisor)  effectiveness  scale  value  on  the  corresponding 
BARS  dimension  that  was  less  than  the  BARS  effectiveness  scale 
value  for  the  incident  in  question.  Figure  5.2  shows  a  plot  of 
the  percentile  scores  by  the  original  effectiveness  scale  values 
for  each  incident.  Each  incident  is  plotted  with  the  letter 
indicating  the  Army  Task  Questionnaire  dimension  it  represents. 

One  concern  in  the  computation  of  percentile  scores  for  each 
incident  was  that  the  BARS  are  MOS-specific,  and  so  the 
percentile  scores  are  derived  from  nine  different  MOS.  If  the 
original  incident  effectiveness  scale  values  were  on  a  relative 
scale  (relative  to  the  abilities  of  the  incumbents  in  a 
particular  MOS),  this  would  not  cause  a  problem — the  same 
relationship  between  scale  value  and  percentile  rank  might  hold 
for  all  dimensions  and  MOS.  If  the  original  effectiveness  scale 
values  were  more  absolute,  however,  the  ability  levels  of 
incumbents  in  differert  MOS  might  alter  the  relationship  between 
scale  values  and  percentile  scores.  To  test  this  hypothesis,  we 
ran  an  analysis  of  covariance  examining  the  relationship  between 
BARS  dimension  and  percentile  score  controlling  for  three 
polynomial  levels  of  effectiveness:  mean  effectiveness, 
(effectiveness  -  5)  squared,  and  (effectiveness  -  5)  cubed.  We 
also  included  the  interaction  of  dimension  with  mean 
effectiveness.  The  results  indicated  no  significant  differences 
in  percentile  scores  associated  with  BARS  dimension  or  with  the 
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Figure  5.2.  Empirical  pe.  entile  scores  plotted  against 
behavioral  incident  effectiveness  scale  values. 


interaction  of  BARS  dimension  and  effectiveness  (p  =  .87  and  .50, 
respectively).  Each  of  the  polynomial  terms  for  effectiveness 
had  a  highly  significant  relationship  to  percentile  scores  (p  < 
.0001  in  all  cases).  We  thus  concluded  that  differences  among 
BARS  dimensions  (and  hence  also  among  MOS  within  which  the  BARS 
dimensions  were  nested)  could  be  ignored.1 


\At  first  glance,  the  use  of  percentile  scores  (normally 
associated  with  a  rectangular  distribution)  in  a  regression 
analysis  may  be  questioned.  However,  it  should  be  pointed  out 
that  we  are  examining  the  sample  of  incidents,  not  the  persons 
who  were  rated.  For  the  incidents,  the  distribution  of 
percentile  values  is  not  expected  to  be  rectangular.  On  the 
other  hand,  it  is  not  normal  either.  Because  there  were  few  mid¬ 
range  instances  available  from  the  critical  incident 
developmental  process,  the  distribution  of  incident  scale  values 
and  incident  percentile  values,  is  bimodal.  Of  course  this 
situation  also  violates  assumptions  needed  for  statistical  tests 
of  the  relationship  between  scale  value  and  percentile  value. 
However,  describing  the  relationship  between  scale  value  and 
percentile  value  for  the  incidents  in  terms  of  a  polynomial 
equation  violates  no  assumptions .  Furthermore,  the  strength  of 
the  association  combined  with  the  general  robustness  of 
regression  suggests  that  our  assertion  is  reasonable:  There 
exists  a  strong  curvilinear  relationship  between  incident  scale 
values  and  percentile  scores. 
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A  subsequent  analysis  was  run,  without  BARS  dimension  as  a 
variable,  to  establish  a  single  equation  which  would  be 
applicable  to  all  incidents  and  could  be  used  to  translate 
effectiveness  scale  values  to  percentile  scores.  In  that 
analysis,  we  found  that  the  fourth-order  polynomial  term  was  also 
significant.  This  function  explained  98%  of  the  variance  in  the 
individual  empirical  percentile  estimates.  We  used  these 
computed  scores  in  preference  to  the  empirical  percentile 
estimates  in  part  because  there  were  five  incidents  that  had  been 
sorted  into  task  dimensions  but  were  not  used  in  creating  the 
MOS-Specific  BARS  (so  no  empirical  scores  were  available)  and  in 
part  as  a  means  of  performing  a  minimal  smoothing  of  the  data. 

The  final  function  used  to  compute  percentile  scores  was: 

P  =  - . 236 * ( E-5 ) **4  - . 300* ( E-5 ) **3  +5 . 400* ( E-5 ) **2  +18.01*E  -63.78 

where  E  is  the  incident  effectiveness  scale  value  and  P  the 
associated  percentile  score. 

Creating  Percentile  Cut  Scores.  The  second  stage  in  scoring 
the  Behavioral  Incident  Questionnaire  was  to  convert  SMEs' 
performance  level  ratings  on  the  incidents  to  cut  scores  which 
indicate  the  minimum  percentile  scores  for  Marginal,  Acceptable, 
and  Outstanding  performance  levels .  This  conversion  was  done 
separately  for  each  judge-by-dimension  combination. 

Two  scoring  methods  were  examined.  The  first,  referred  to 
as  the  "Average"  method,  defined  each  cutoff  as  the  midpoint 
between  the  average  percentile  score  for  all  incidents  rated  at 
one  level  (e.g..  Unacceptable)  and  the  average  percentile  score 
for  all  incidents  rated  at  the  next  higher  level  (e.g., 

Marginal).  For  example,  consider  Table  5.2  which  depicts  a 
completed  hypothetical  Behavioral  Incident  Questionnaire  for  a 
single  judge  and  a  single  dimension.  In  this  example,  the 
average  percentile  score  for  all  incidents  rated  Unacceptable  is 
30,  average  percentile  score  for  all  incidents  rated  Marginal  is 
49,  average  percentile  for  all  Acceptable  incidents  is  74.2,  and 
average  for  all  Outstanding  incidents  is  87.5.  Using  the  Average 
scoring  method,  this  judge's  cutoff,  in  percentile  scores,  for 
Marginal  performance  is  39.5,  for  Acceptable  performance  is  61.6, 
and  for  Outstanding  performance  is  80. 8. 

With  the  Average  method  if  there  were  no  incidents  rated  as 
Outstanding,  100  was  used  as  the  average  rating  for  Outstanding 
performance.  If  no  incidents  were  rated  Unacceptable,  zero  was 
used  as  the  average  rating  for  Unacceptable  performance.  If  no 
incidents  were  rated  as  Marginal,  the  midpoint  between  the 
averages  for  Unacceptable  and  Acceptable  was  used  as  the  lower 
limit  for  both  the  Marginal  and  Acceptable  performance  levels. 
Similarly,  if  no  incidents  were  rated  as  Acceptable,  the  midpoint 
between  the  averages  for  Marginal  and  Outstanding  was  used  as  the 
cutoff  for  both  Acceptable  and  Outstanding. 
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Table  5.2 


Completed  Hypothetical  Behavioral  Incident  Questionnaire  for  a 
Single  Judge's  Ratings  on  a  Single  Dimension 


Incident 

Performance 

Level 

Percentile 

Score 

1 

Unacceptable 

25 

2 

Unacceptable 

25 

3 

Unacceptable 

30 

4 

Unacceptable 

35 

5 

Unacceptable 

35 

6 

Marginal 

45 

7 

Marginal 

45 

8 

Marginal 

50 

9 

Marginal 

50 

10 

Marginal 

55 

11 

Acceptable 

70 

12 

Acceptable 

70 

13 

Acceptable 

75 

14 

Acceptable 

75 

15 

Acceptable 

75 

16 

Acceptable 

80 

17 

Outstanding 

75 

18 

Outstanding 

90 

19 

Outstanding 

90 

20 

Outstanding 

95 

The  second  scoring  method,  used  exclusively  in  Phase  II 
scoring  and  herein  referred  to  as  the  "End-Point"  method,  defined 
each  cutoff  as  the  midpoint  between  the  maximum  percentile  score 
for  the  incidents  rated  at  one  level  (e.g..  Unacceptable)  and  the 
minimum  percentile  score  for  the  incidents  rated  at  the  next 
higher  level  (e.g..  Marginal).  Once  again,  consider  the 
hypothetical  data  in  Table  5.2.  The  maximum  percentile  score  for 
incidents  rated  Unacceptable  is  35,  and  the  minimum  percentile 
score  for  incidents  rated  Marginal  is  45.  Therefore,  the  cutoff 
for  Marginal  performance  is  40.  Similarly  the  cutoff  that 
defines  Acceptable  performance  is  62.5.  The  same  computational 
procedure  is  used  even  when  there  are  reversals  in  the  ratings. 
For  the  sample  data,  the  highest  Acceptable  incident  is  80,  and 
the  lowest  Outstanding  incident  is  75.  The  cutoff,  therefore,  is 
77.5. 
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Where  the  percentile  distributions  overlap  for  incidents 
rated  at  two  different  performance  levels,  the  End-Point  method 
will  minimize  the  maximum  "error",  where  error  refers  to  the 
extent  to  which  an  incident  at  a  lower  (higher)  performance  level 
has  a  percentile  score  above  (below)  the  computed  cutoff.  The 
Average  method,  on  the  other  hand,  is  likely  to  produce  more 
stable  estimates  because  scores  for  all  items  at  each  performance 
level  are  used  rather  than  just  one  score  from  each  performance 
level . 

In  screening  the  Behavioral  Incident  Questionnaire  data,  we 
eliminated  cases  where  the  judge  was  unable  to  rate  5  or  more  of 
the  20  incidents  for  a  dimension.  This  resulted  in  the 
elimination  of  data  for  78  combinations  of  judges  and  dimensions. 
(For  eight  judges,  two  dimensions  were  dropped,  and  for  five 
judges  three  dimensions  were  dropped.  In  all  other  cases,  only  a 
single  dimension  was  dropped  for  any  one  judge.)  We  also  dropped 
a  few  cases  where  the  cut  scores  were  reversed  and  the  difference 
was  more  than  four  standard  deviations  of  the  average  of  the 
differences  between  cut  scores.  This  resulted  in  cases  being 
dropped  if  the  difference  between  two  Average  method  cut  scores 
was  reversed  by  more  than  0.5  percentage  points  or  the  difference 
between  two  End-Point  method  cut  scores  was  reversed  by  more  than 
2 . 0  percentage  points .  Where  a  case  was  dropped  for  either 
missing  data  or  reversals,  all  scores  were  deleted  to  maintain 
comparability  in  the  samples  used  to  evaluate  the  two  different 
methods.  (In  all,  20  of  the  23  cases  dropped  for  reversals  were 
dropped  because  of  reversals  in  the  Marginal  and  Acceptable  cut 
scores  computed  by  the  End-Point  method).  Table  5.3  shows  the 
number  of  cases  dropped  for  each  dimension. 

Descriptive  Statistics 

Table  5.4  presents  means  and  standard  deviations  across 
raters  for  the  standards  set  by  each  MOS.  Results  for  both 
scoring  methods  are  presented.  It  should  be  noted  that  these 
means  represent  the  MOS  performance  cutoffs  and  that  the  standard 
deviations  are  one  index  of  within-MOS  rater  agreement. 

MOS  statistics  (i.e.,  the  MOS  means  and  standard  deviations 
presented  in  Table  5.4)  were  averaged,  by  dimension,  across  the 
MOS.  Table  5.5  presents  a  summary  of  those  results.  Thus,  for 
Dimension  B,  four  MOS  are  represented.  As  calculated  by  the 
Average  method,  the  mean  for  those  four  MOS  cutoffs  for  Marginal 
performance  is  6.33,  and  the  four  MOS  cutoffs  have  a  standard 
deviation  of  1.54.  Also  the  means  and  standard  deviations  across 
MOS  are  presented  for  the  within-MOS  standard  deviations.  Thus, 
as  calculated  by  the  Average  method,  the  mean  standard  deviation 
for  the  Marginal  cutoff  for  the  four  MOS  rating  Dimension  B  is 
7.97,  and  the  across-MOS  standard  deviation  of  those  within-MOS 
standard  deviations  is  3.15. 
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Table  5.3 


Number  of  Behavioral  Incident  Questionnaires  Dropped  During  Edits 


Reason 

Dropped 

Dimension 

Valid 

Missing 

Reversals 

Total 

B. 

Electrical  & 
Electronic  Main't 

160 

17 

1 

178 

D. 

Vehicle  &  Eguipt, 
Operations 

351 

14 

2 

367 

H. 

Clerical 

210 

23 

1 

234 

I. 

Communications 

268 

14 

1 

283 

M. 

Individual  Combat 

655 

10 

17 

682 

N. 

Crew-served 

Weapons 

72 

0 

1 

73 

Total 

1716 

78 

23 

1817 

Across  all  30  MOS  and  dimension  combinations,  the  cutoffs 
for  the  three  performance  levels  are  widely  spread.  Minimum 
performance  for  Marginal  level  is  at  the  6  to  7  percentile  level. 
Minimum  performance  for  Acceptable  level  jumps  uj:  to  32 
percentile  based  on  the  End-Point  method  and  39  ^  :rcentile  for 
the  Average  Method.  For  performance  to  be  considered  Outstanding 
requires  percentile  scores  in  the  upper  70s.  Given  the  wide 
spread  of  the  cutoffs  across  the  three  levels,  the  variation  in 
the  means  across  MOS  and  dimensions  seems  slight.  These 
differences  are  explored  further  below. 

There  are  two  perspectives  on  the  meaning  of  the  within-MOS 
standard  deviation.  Again  considering  the  wide  spread  between 
the  performance  levels,  these  within-MOS  standard  deviations, 
which  average  from  7.5  to  18.3  depending  on  level  and  scoring 
method,  suggest  that  raters  do  not  have  major  disagreements 
(i.e.,  one  rater's  Marginal  is  not  likely  to  be  another  rater's 
Outstanding).  However,  focusing  on  any  one  cutoff,  these 
standard  deviations  appear  rather  large,  particularly  for  the 
Marginal  and  Outstanding  percentile  cutoffs.  At  these  extremes, 
percentile  differences  on  the  order  of  10  points  or  so  would  be 
translated  into  sizable  test  score  differences.  However, 
standards  will  not  be  based  on  a  single  rater,  and  the  more 
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appropriate  statistic  to  consider  is  the  standard  error  of  the 
mean.  If  we  project  using  60  raters  to  set  standards,  as 
suggested  in  Chapter  3  to  insure  representativeness,  then 
standard  deviations  of  7.5  to  18.3  result  in  standard  errors  of 
the  mean  of  1.0  to  1.6  for  the  Average  scoring  method  and  1.4  to 
2.4  for  the  End-Point  scoring  method. 

One  other  observation  may  be  made  concerning  these 
statistics.  The  Marginal  cutoffs,  by  either  method,  suggest 
that,  on  the  average,  currently  6  to  7%  of  the  soldier  population 
is  Unacceptable  and  should  not  be  in  the  MOS .  In  some  of  the 
workshops,  SMEs  had  some  problems  understanding  this  category. 
That  is,  they  would  argue  that  if  a  soldier  were  Unacceptable  he 
or  she  wouldn't  be  in  the  Army.  We  sometimes  augmented  the 
description  by  calling  these  Unacceptable  soldiers  selection 
mistakes.  From  these  data,  the  collection  of  SMEs  that 
originally  provided  the  Project  A  scale  development  effectiveness 
values,  those  that  gave  BARS  ratings  to  soldiers  during  Project  A 
concurrent  validation,  and  those  that  participated  in  the 
synthetic  validity  workshops  recognize  that  such  mistakes  exist. 
On  the  other  side  of  the  coin,  this  system  leads  to  the 
conclusion  that  upwards  of  20  to  25%  of  soldiers  are  "making 
exceptional  contributions  to  the  Army"  as  Outstanding  soldiers. 

Scoring  method  differences.  Tables  5.4  and  5.5  suggest  that 
the  two  scoring  methods  are  not  congruent,  particularly  with 
regard  to  rater  agreement.  Table  5.6  presents  tests  of  the 
differences  among  cutoff  means  and  within-MOS  agreement  (i.e., 
standard  deviations)  for  the  two  scoring  methods.  The  data  from 
Table  5.5  were  treated  as  repeated  measures  data  with  the  MOS-by- 
dimension  combinations  as  cases  and  the  scoring  methods  as 
repeated  "trials."  Within-MOS  means  and  standard  deviations  were 
tested  separately.  Planned  orthogonal  contrasts  were  set  up  to 
compare  means  and  standard  deviations  for  each  of  the  three 
cutoffs.  Each  of  the  six  comparisons  was  statistically 
significant.  The  bottom  row  of  Table  5.6  indicates  that, 
compared  to  the  End-Point  method,  the  Average  method  of  scoring 
the  Behavioral  Incident  data  led  to  lower  cutoffs  for  Marginal 
and  Outstanding  performance  but  a  higher  cutoff  for  Acceptable 
performance.  More  important,  the  Average  method  produced  lower 
within-MOS  standard  deviations,  indicative  of  higher  agreement 
among  raters.  This  latter  effect  may  be'  due  to  the  Average 
method  smoothing  out  aberrant  ratings  on  single  incidents  which 
unduly  influence  scoring  under  the  End-Point  method. 
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Table  5.6 


Planned  Comparison  for  Scoring  Method  Effects  on  MOS  Cut  Score 
Means  and  Variances  Using  Repeated  Measures  ANOVA 


Source 

SS 

df 

MS 

F 

E 

Scoring  Effect  on 

Cut  Score 

Means : 

Marginal  Cut 

39.606 

1 

39.606 

5.329 

0.028 

Error 

215.519 

29 

7.432 

Acceptable  Cut 

1274.660 

1 

1274.660 

94.764 

0.000 

Error 

390.078 

29 

13.451 

Outstanding  Cut 

156.682 

1 

156.682 

67.196 

0.000 

Error 

67.620 

29 

2.332 

Scoring  Effect  on 

Cut  Score 

Standard  Deviations : 

(within-MOS  variation ) 

Marginal  Cut 

406.566 

1 

406.566 

26.341 

0.000 

Error 

447.616 

29 

15.435 

Acceptable  Cut 

3158.233 

1 

3158.233 

712.555 

0.000 

Error 

128.536 

29 

4.432 

Outstanding  Cut 

401.941 

1 

401.941 

109.326 

0.000 

Error 

106.619 

29 

3.677 

Note .  Each  MOS-by-Dimension  combination  is  a  case;  means  and 
standard  deviations  for  ratings  within  each  MOS  are  the 
variables . 


Behavioral  Incident  Reliability  Estimates 

Reliability  estimates  were  computed  for  each  rater  group 
(e.g.,  TRADOC  NCOs,  TRADOC  Officers),  for  rater  groups  combined 
within  command  (e.g.,  TRADOC  total),  for  rater  groups  combined 
across  commands  (e.g.,  combined  NCOs),  and  for  all  raters  across 
rank  and  command.  The  reliability  estimation  procedure  was 
identical  to  that  used  for  the  Army  Task  Questionnaire.  Separate 
estimates  were  computed  for  each  dimension  rated  within  each  MOS 
and  for  each  scoring  method.  Table  5.7  presents  single-rater  and 
overall  reliability  estimates  for  cutoffs  from  the  Average 
scoring  method,  and  Table  5.8  presents  single-rater  and  overall 
reliability  estimates  for  cutoffs  produced  by  the  End-Point 
scoring  method.  Single-rater  reliability  estimates  are  used  in 
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order  to  compare  different  MOS,  dimensions,  and  rater  groups. 

The  last  column  in  each  table  presents  reliability  estimates  for 
cutoffs  calculated  using  all  available  raters.  Tables  5.9  and 
5.10  present  means  for  the  single-rater  reliability  estimates, 
first  by  dimension  across  the  relevant  MOS,  and  then  across  all 
MOS -by-dimension  combinations . 

Several  observations  may  be  made.  First,  the  difference 
between  the  scoring  methods  in  rater  agreement  that  appeared  in 
the  within-MOS  standard  deviations  is  apparent.  Single-rater 
reliability  estimates  for  the  Average  scoring  method  are  about 
twice  as  large  as  those  for  the  End-Point  scoring  method.  For  a 
sample  of  only  10  raters,  one  would  expect  Behavioral  Incident 
reliabilities  to  exceed  .95  when  scored  by  the  Average  method  and 
.84  when  scored  by  the  End-Point  method.  Using  a  reliability 
criterion,  the  Average  method  of  scoring  the  Behavioral  Incident 
data  is  superior  to  the  End-Point  method.  The  magnitude  of  the 
difference  suggests  that  the  End-Point  method  be  considered  no 
further . 

Second,  reliability  estimates  for  raters  combined  across 
rank  and  command  do  not  appear  to  be  systematically  less  than  the 
reliabilities  of  the  separate  groups  with  the  exception  of  the 
Officers.  Officer  reliabilities  appear  somewhat  higher.  Using 
the  reliability  estimates  from  Table  5.7  for  the  Average  scoring 
method  as  dependent  variables,  differences  by  rank,  command,  and 
dimension  were  tested  with  a  series  of  ANOYAs.  Rank  differences 
were  tested  using  the  combined  command  reliabilities  (columns  8, 
9,  and  10  in  the  tables)  with  preplanned  F  comparisons.  Officers 
were  more  reliable  than  NCOs  (F1i62  =  12.06,  p  <  .01)  and 
Civilians  were  less  reliable  than  NCOs  or  Officers  (F1(62  =  5.17, 
p  <  .05).  Multiple  R  for  the  overall  rank  effects  was  .16 
indicating  that  the  rank  differences  in  reliability,  although 
detectable,  were  not  very  strong.  Command  differences  (comparing 
columns  4  and  7  in  Table  5.7)  were  not  significant.  Further 
tests  of  rank  and  command  differences  are  presented  below. 

A  third  comparison  among  the  reliability  estimates  concerns 
the  dimensions.  The  last  column  in  summary  Table  5.9  suggests 
variation  across  the  dimension.  Dimension  differences  are 
significant  (£5,24  =  4.08,  p  <  .01)  with  a  Multiple  R  for  the 
dimension  effect  of  .68.  Single-rater  reliability  estimates  vary 
from  .62  for  Individual  Combat  to  .78  for  Communication. 

Finally,  the  strength  of  these  reliabilities  may  provide  a 
false  sense  of  security.  Reliability  is  as  much  a  function  of 
"true  score"  variance  as  error  variance;  Brennan  (1983)  suggests 
thinking  in  terms  of  signal  to  noise  ratio.  Our  objects  of 
measurement — the  three  cutoff  levels — vary  widely.  They  have  a 
"strong  signal"  which  can  tolerate  a  lot  of  noise,  so  that  we  are 
not  going  to  confuse  the  Marginal  cutoff  with  the  Acceptable 
cutoff.  On  the  other  hand,  we  still  need  to  be  concerned  about 
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Individual  Combat 


the  precision  that  can  be  provided  by  the  instrument  for  each 
cutoff.  We  can  obtain  a  summary  estimate  of  the  error  around 
each  cutoff  by  using  the  standard  deviation  of  mean  cutoffs 
across  all  MOS  and  dimensions  (6.03,  38.91,  and  77.57).  The 
standard  deviation  of  these  three  cutoffs  is  29.23.  Based  on  our 
projection  of  60  raters  and  the  overall  average  single-rater 
reliability  of  .69,  the  standard  error  of  the  measurement  for 
each  cutoff  is  projected  to  be  2.5.  This  error  estimate  is 
slightly  higher  than  the  those  provided  by  the  standard  error  of 
the  mean  calculations  (i.e.,  1.0  to  1.6).  The  discrepancy  may 
arise  because  we  are  using  averages  of  cutoffs,  averages  of 
standard  deviations,  and  averages  of  reliabilities  to  make  our 
error  projections  for  the  instrument  as  a  whole. 

Differences  Among  Standard  Setting  Cutoffs 

For  Behavioral  Incident  standards,  there  are  four  variables 
that  may  potentially  impact  on  the  standard  setting  results. 

These  include  potential  rater  group  differences  of  rank  and 
command,  differences  among  the  MOS,  and  differences  among  the 
various  task  dimensions.  This  section  examines  these  potential 
differences  using  only  the  cutoffs  produced  by  the  Average 
scoring  method. 

The  effects  of  the  four  identified  variables  on  the  three 
cutoff  points  (i.e.,  percent  performing  below  Marginal,  below 
Acceptable,  and  below  Outstanding)  were  tested  simultaneously 
using  each  rater-by-dimension  response  set  as  a  case  and  the 
three  standard  cutoff  points  as  repeated  observations.  Using 
repeated  measures  ANOVA  with  three  "trials"  and  four  grouping 
factors,  the  between-sub jects  results  provide  a  test  of  overall 
differences  in  strictness  or  leniency  between  MOS,  between 
dimensions,  between  ranks,  and  between  commands.  The  within- 
subjects  interactions  test  the  consistency  of  the  effects  of 
those  variables  across  the  three  cutoff  points.  Civilians  were 
excluded  from  this  analysis  because  of  their  small  number  and 
uneven  distribution  across  the  MOS.  Table  5.11  presents  the 
results  of  the  repeated  measures  ANOVA.  Both  between-  and 
within-sub jects  effects  are  present  for  MOS  and  dimension. 
Within-sub jects  effects  are  also  significant  for  rank-by-level 
effects.  Command  differences  do  not  appear. 

Because  of  the  within-sub jects  interaction  effects  for  MOS, 
task  dimension,  and  rank,  separate  ANOVAs  were  run  for  each 
performance  cutoff.  These  results  are  presented  in  Table  5.12. 

MOS  is  the  only  variable  that  appears  to  influence  each 
performance  level  cutoff.  Task  dimension  differences  effect  only 
the  Acceptable  and  Outstanding  categories,  and  based  on  the  sums- 
of-squares,  dimension  differences  have  the  greatest  impact  on  the 
standards.  The  effects  of  rater  rank  are  found  only  for  the 
Marginal  cutoff.  Multiple  Rs  for  these  effects  on  the 
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Table  5.11 


MOS,  Task  Dimension,  Rater  Rank,  and  Rater  Command  Effects  on 
Three  Levels  of  Behavioral  Incident  Standards 


SOURCE 

SS 

df 

MS 

F 

£ 

Between- Sub j ects 
Effects: 

MOS 

4623.267 

10 

462.327 

3.313 

0.000 

Dimension 

13157.101 

5 

2631.420 

18.854 

0.000 

Rank 

307.410 

1 

307.410 

2.203 

0.138 

Command 

0.001 

1 

0.001 

0.000 

0.998 

Subjects  w.  groups 

224287.387 

1607 

139.569 

Wi thin- Subjects 
Effects: 

Level 

1923856.705 

2 

961928.353 

13692.459 

0.000 

MOS  x  Level 

4915.872 

20 

245.794 

3.499 

0.000 

Dimension  x  Level 

5674.553 

10 

567.455 

8.077 

0.000 

Rank  x  Level 

1743.440 

2 

871.720 

12.408 

0.000 

Command  x  Level 

268.257 

2 

134.129 

1.909 

0.140 

Level  x  Sub j . 

w.  groups 

225791.271 

3214 

70.252 

Marginal,  Acceptable  and  Outstanding  levels  are  .19,  .27,  and 
.29,  respectively.  These  Multiple  Rs  suggest  that  the  magnitude 
of  the  MOS,  dimension,  or  rank  effects  is  not  large. 

Figures  5.3,  5.4,  and  5.5  graphically  depict  the  results 
presented  in  Table  5.12.  Figure  5.3  shows  the  standards  for  each 
MOS  that  result  when  standards  are  averaged  for  the  dimensions 
that  are  relevant  to  the  MOS.  Similarly,  Figure  5.4  shows  the 
standards  for  each  dimension  that  result  when  standards  are 
averaged  across  the  MOS  that  rated  the  dimension.  Finally, 

Figure  5.5  presents  standards  set  by  NCOs  and  Officers  averaged 
across  all  of  the  MOS  and  dimensions.  Except  for  the  Outstanding 
level,  MOS  and  task  dimension  differences  are  rather 
unremarkable.  For  all  levels,  the  NCO  and  Officer  differences  do 
not  appear  striking.  Thus,  although  the  ANOVAs  present 
statistically  significant  differences  among  the  standards  by  MOS, 
dimension,  and  rank,  the  size  of  the  Multiple  Rs  and  the  graphic 
presentation  of  those  differences  suggest  that  in  practical  terms 
the  differences  may  not  be  very  meaningful. 
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Table  5.12 


MOS,  Task  Dimension,  Rater  Rank,  and  Rater  Command  Effects  on 
Each  Level  of  Behavioral  Incident  Standards 


Source 

SS 

df 

MS  . 

F 

2 

Marginal  Cutoff: 

MOS 

1798.773 

10 

179.877 

2.920 

0.001 

Dimension 

144.678 

5 

28.936 

0.470 

0.799 

Rank 

1559.485 

1 

1559.485 

25.317 

0.000 

Error 

99050.685 

1608 

61.599 

Acceptable 

Cutoff: 

MOS 

1240.026 

10 

124.003 

1.865 

0.046 

Dimension 

6600.242 

5 

1320.048 

19.853 

0.000 

Rank 

118.472 

1 

118.472 

1.782 

0.182 

Error 

106916.619 

1608 

66.490 

Outstanding 

Cutoff: 

MOS 

6535.643 

10 

653.564 

4.300 

0.000 

Dimension 

12085.525 

5 

2417.105 

15.904 

0.000 

Rank 

400.007 

1 

400.007 

2.632 

0.105 

Error 

244379.611 

1608 

151.977 

Task-Based  Standard  Setting  Form 


The  Task-Based  Standard  Setting  Form  attempts  to  obtain 
cutoffs  between  Unacceptable,  Marginal,  Acceptable,  and 
Outstanding  performance  on  task  dimensions  by  using  samples  of 
tasks  as  the  bases  of  judgment.  Several  versions  of  this 
approach  were  used  in  Phase  II,  and  all  of  them  revolved  around 
matching  cutoffs  to  test  scores  on  hands-on  tests  of  the  tasks. 
For  the  Phase  III  version  of  the  instrument,  raters  were 
presented  one  of  the  easier  formats  which  asked  them  to  simply 
indicate  which  test  scores  reflected  the  minimum  level  of 
performance  for  Marginal,  Acceptable,  and  Outstanding 
performance.  In  contrast  to  the  "detailed"  methods  from  Phase 
II,  no  specific  information  was  given  about  the  tests  themselves. 
For  Phase  III,  raters  were,  however,  given  information  about 
distributions  of  performance  on  the  sample  tasks. 

For  each  dimension  of  job  performance  on  which  Phase  III 
standards  were  to  be  set,  a  separate  one  page  Task-Based 
instrument  was  prepared.  From  the  pool  of  tasks  with  Project  A 
hands-on  performance  tests,  tasks  were  identified  that 
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Percentile 
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V  MARGINAL 
a  ACPTABLE 
OUTSTAND 


MOS 

Figure  5.3.  Behavioral  Incident  percentile  cutoffs  for  each  MOS. 
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P  ACPTABLE 
OUTSTAND 


Task  Dimension 

Figure  5.4.  Behavioral  Incident  percentile  cutoffs  for  each  Task 
Dimension.  (See  Table  5.1  for  Task  Dimension  Names.) 
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Figure  5.5. 
Officers . 


Behavioral  Incident  percentile  cutoffs  for  NCOs  and 


communicated  the  character  of  each  dimension.  Then,  three 
exemplar  tasks,  labelled  "sample  tasks",  were  selected  for  the 
questionnaire.  Tasks  were  selected  to  be  representative  of  the 
Project  A  performance  distributions  for  the  dimension  as  a  whole. 
The  instrument  presented  the  three  sample  tasks  and  a  single 
performance  distribution  that  represented  the  performance,  based 
on  Project  A  data  concurrent  validation  data,  across  the  three 
sample  tasks.  For  each  level  of  test  score,  expressed  as  percent 
of  task  steps  performed  correctly  (percent  GO  scores)  and 
presented  in  5-point  increments,  the  form  indicated  the  percent 
of  soldiers  performing  at  or  below  that  score.  Thus,  each 
dimension  cutoff  could  be  expressed  in  terms  of  percent  GO  test 
scores  or  percentile  scores.  The  percentile  cutoff  metric  is 
equivalent  to  Behavioral  Incident  scoring. 

Each  dimension  form  was  administered  separately,  with  raters 
working  as  a  group  for  part  of  the  session.  The  raters'  first 
task  was  to  evaluate  the  exemplar  tasks  and  determine  their 
relevance  to  the  MOS  being  rated.  For  tasks  that  were  not 
relevant,  raters  were  asked  to  think  of  a  task,  termed 
"substitute  task",  in  their  MOS  that  matched  the  characteristics 
and  difficulty  of  the  sample  task.  Then  as  a  group,  the  raters 
agreed  on  three  tasks,  which  might  be  any  combination  of  given 
tasks  and  substitute  tasks,  to  consider  as  representative  of  the 
dimension.  Raters  then  were  to  consider  the  test  scores  and 
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performance  distributions  and  set  cutoffs  between  Unacceptable, 
Marginal,  Acceptable,  and  Outstanding  performance.  Instrument 
instructions  and  workshop  leaders  explained  that  the  performance 
distributions  were  obtained  from  actual  test  data,  and  they 
explained  the  circumstances  under  which  that  data  was  collected. 
It  was  emphasized  that,  unlike  most  Army  performance  testing 
situations,  soldiers  were  given  no  advance  warning  and  therefore 
had  no  opportunity  to  prepare  for  the  tests. 

The  Phase  III  Task-Based  Form  was  originally  designed  for 
raters  to  write  on  three  separate  lines  the  three  percent  GO 
scores  that  represented  their  chosen  cut  points.  This  format  was 
revised  after  the  workshops  at  the  second  data  collection  site. 

In  the  revised  design,  raters  drew  lines  on  the  form,  graphically 
indicating  their  chosen  dividing  lines .  A  Task-Based  Standard 
Setting  Form  is  presented  in  Appendix  A,  Attachment  3.  A 
complete  set  of  forms  may  be  found  in  Volume  II. 

Task  dimensions  were  matched  to  MOS  as  presented  in  Table 
5.1  of  the  chapter  introduction.  In  addition,  raters  in  each  MOS 
rerated  two  dimensions  after  a  feedback  discussion  session  was 
conducted.  As  part  of  the  group  discussion  session,  SMEs 
assisted  the  workshop  leader  in  tabulating  the  cutoff  points  set 
during  the  initial  rating  session.  The  discussion  focused  on  the 
variations  in  standards  for  each  performance  level  with  the 
workshop  leader  soliciting  reasons  for  strict  and  lenient 
standards.  Dimensions  ware  discussed  and  rerated  separately. 

Qualitative  Feedback 

As  indicated  above,  the  Task-Based  Standard  Setting  Form  had 
to  be  revised  after  the  workshops  at  the  first  two  data 
collection  sites.  For  several  reasons,  raters  were  confused  by 
the  questionnaire  response  format.  They  were  asked  to  indicate 
on  three  separate  blanks  three  unique  numbers  that  represented 
the  minimum  percent  GO  scores  for  Marginal,  Acceptable,  and 
Outstanding  performance.  Rater  errors  included  writing  in 
percentile  scores,  writing  in  percent  GO  scores  below  the  cutoff 
(i.e.,  the  top  of  the  next  lower  level),  and  using  the  same 
number  for  more  than  one  cutoff.  At  some  point  during  the 
workshops  at  the  second  data  collection  site,  one  of  the  workshop 
leaders  determined  that  it  would  be  more  efficient  to  have  raters 
draw  three  lines  on  their  forms  to  divide  the  four  performance 
groups.  Test  scores  and  performance  distribution  information  was 
presented  in  matching  columns  so  that  a  line  drawn  across  the 
page  showed  each  division  in  terms  of  test  score  and  distribution 
and  there  was  no  confusion  about  the  meaning  of  the  placement  of 
the  line.  In  addition,  the  performance  distribution  data  on  the 
forms  used  at  the  first  two  sites  was  incorrect.  Because  of  the 
incorrect  data  and  the  confusion  over  the  responses,  the  Task- 
Based  Standard  Setting  Form  data  from  the  first  two  data 
collection  sites  were  not  included  in  the  analyses  reported 
below. 
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Based  on  comments  from  the  feedback  discussion  sessions,  a 
number  of  observations  were  made.  These  concern  the  various 
standard  setting  strategies  used  by  the  raters,  the  reference 
points  used  by  the  raters,  and  suggested  improvements  in 
questionnaire  format  and  administration. 

The  strategies  used  by  SMEs  to  establish  performance 
standards  using  the  Task-Based  exercise  fall  into  five 
categories.  These  categories  are  briefly  described  below. 

1.  Criticality.  A  criticality  strategy  was  used  to  set 
strict  standards  on  life  threatening  (i.e..  Individual  Combat) 
and  MOS-specific  dimensions.  For  Individual  Combat,  SMEs  felt 
that  standards  should  be  strict  because  not  only  is  the 
individual  soldier's  life  at  stake  but  also  are  the  lives  of 
several  other  soldiers.  For  dimensions  that  are  central  to  the 
MOS,  SMEs  wanted  to  establish  strict  standards  because 
theoretically  the  soldier's  specialized  training  qualifies  him  or 
her  as  an  expert  in  that  area . 

2.  Difficulty.  If  a  dimension  was  perceived  as  being 
particularly  difficult,  SMEs  were  willing  to  set  lenient 
standards.  Conversely,  SMEs  tended  to  set  strict  standards  for 
"easy"  dimensions. 

3.  Traditional  Standards .  Many  SMEs  used  traditional 
notions  of  70,  80,  and  90%  correct  to  set  minimum  standards  for 
Marginal,  Acceptable,  and  Outstanding,  respectively.  The 
traditional  standards  strategy  seemed  to  be  used  more  frequently 
to  set  the  Marginal  cutoff  of  60  or  70%  correct  than  for  higher 
levels  of  job  performance.  Many  participants  stated  that  they 
felt  comfortable  with  a  60  or  70%  cutoff  because  those  are  so 
prevalently  used  in  high  school  as  well  as  in  the  Army. 

4.  Frequency  of  Performance.  SMEs  tended  to  set  stricter 
standards  for  task  dimensions  that  were  performed  frequently 
compared  to  those  performed  infrequently.  Even  if  a  task  was 
perceived  as  critical  or  difficult,  SMEs  were  willing  to  be 
somewhat  lenient  if  it  was  performed  infrequently.  However, 
leniency  tended  to  evidence  itself  at  the  Acceptable  and 
Outstanding  cutoffs  rather  than  at  the  Marginal  level.  For 
example,  throwing  grenades  or  loading,  clearing,  or  reducing 
stoppage  in  an  M16  rifle,  for  some  MOS,  are  performed 
infrequently  outside  of  basic  training.  Because  these  are 
critical  tasks,  SMEs  tended  to  set  a  higher  Marginal  cutoff  than 
for  less  critical  tasks.  However,  SMEs  were  often  lenient  in 
their  Acceptable  and  Outstanding  standards  for  these  tasks 
because  they  are  rarely  performed. 

5.  Normal  Distribution.  A  few  SMEs  wanted  to  set  standards 
so  that  20%  of  the  examinees  would  fall  in  the  Outstanding 
category,  20%  in  the  Unacceptable  category,  and  the  remaining  60% 
would  be  divided  between  the  Acceptable  and  Marginal  categories . 
Because  the  Project  A  hands-on  test  scores  are  not  normally 
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distributed  and  these  particular  percentile  values  did  not  fall 
exactly  on  any  of  the  5-point  increment  test  score  options  that 
were  presented,  this  strategy  could  not  be  fully  implemented, 
only  approximated.  A  normal  distribution  strategy  was  expressed 
by  only  a  few  SMEs ,  all  at  TRADOC  installations. 

In  addition  to  the  strategies  articulated  by  SMEs,  workshop 
leaders  observed  other  strategies  that  are  pertinent  to  the 
standard  setting  process.  One  such  observation  concerns  what  is 
referred  to  in  the  standard  setting  literature  as  "absolute" 
versus  "relative"  standards.  Should  standards  be  "etched  in 
stone",  or  should  they  be  adjusted  so  that  a  specified  percentage 
of  examinees  pass  and/or  fail?  These  theoretical  differences  are 
often  labeled  norm- ref erenced  (i.e.,  "relative"  standards)  and 
criterion-referenced  (i.e.,  "absolute"  standards). 

Theoretically,  these  differences  seem  to  be  incompatible.  In 
practice,  however,  standard  setting  decisions  reflect  a 
combination  of  the  two  philosophies  (Shephard,  1980).  The 
merging  of  these  theoretical  differences  presented  itself  in 
discussions  with  workshop  participants. 

A  norm-referenced  paradigm  was  most  likely  to  evidence 
itself  when  standards  were  set  on  the  Individual  Combat 
dimension.  In  that  case,  SMEs  tended  to  set  a  minimum  cutoff 
(i.e..  Marginal)  at  the  point  at  which  50%  or  more  of  the 
examinees  passed.  Their  rationale  was  that  in  combat  they  wanted 
at  least  half  of  the  troops  to  survive.  By  setting  a  minimum 
performance  score  at  that  point,  they  felt  that  they  increased 
their  chances  of  attaining  that  goal. 

The  criterion-referenced  paradigm  manifested  itself  when 
SMEs  either  consciously  or  unconsciously  decided  not  to  use  the 
normative  data  to  establish  their  standards.  Some  SMEs  ignored 
the  normative  data  because  they  could  not  figure  out  how  to  use 
it.  Their  confusion  seemed  to  be  due  to  the  individuals' 
inability  to  interpret  the  data  rather  than  to  a  lack  of  clarity 
in  the  instructions  given  that  other  SMEs  in  the  same  session  did 
use  the  data.  Some  SMEs  made  a  conscious  decision  to  ignore  the 
normative  data  because  they  wanted  soldiers  to  perform  at  a  given 
level  regardless  of  any  implications  about  the  number  of  soldiers 
whose  performance  would  fall  into  a  particular  category.  Some  of 
the  latter  group  of  SMEs  realized  that  by  setting  high 
performance  standards  they  could  influence  the  quality  of 
soldiers  selected  for  their  MOS. 

Another  issue  raised  in  the  Task-Based  exercise  concerns 
exactly  what  SMEs  concentrated  on  when  setting  their  individual 
standards.  In  other  words,  did  SMEs  focus  on  (a)  a  single  sample 
task,  (b)  the  dimension  as  defined  by  the  three  sample  tasks,  or 
(c)  the  dimension  as  defined  by  all  tasks  (i.e.,  the  sample  tasks 
plus  others)?  According  to  the  instructions,  raters  were  to  set 
standards  on  dimensions  of  job  performance  by  focusing  on  three 
sample  tasks  selected  to  represent  that  dimension.  During  the 
discussion,  some  SMEs  commented  on  only  one  sample  task  for  a 
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particular  dimension.  This  behavior  would  lead  one  to  believe 
that  rather  than  cognitively  combining  standards  for  the  three 
sample  tasks  to  yield  a  dimension  standard  those  SMEs  actually 
set  standards  on  only  one  task.  At  some  workshops,  SMEs  were 
asked  whether  they  focused  on  the  dimension  or  the  sample  tasks . 
The  responses  were  spread  about  evenly  across  the  two  choices. 

In  hindsight,  however,  the  term  "dimension"  was  not  clearly 
defined.  In  other  words,  "dimension"  could  mean  the  three  sample 
tasks  only,  or  it  could  mean  the  three  sample  tasks  plus  all 
other  tasks  within  the  dimension.  Given  the  ambiguous  nature  of 
the  question,  reliable  inferences  cannot  be  drawn  regarding  the 
SMEs'  definitions  of  dimension.  Because  no  manipulation  check 
was  made  to  ensure  that  SMEs  centered  on  nothing  more  or  less 
than  the  three  sample  tasks ,  one  cannot  be  sure  of  the  cognitive 
processes  used  to  set  standard.,  on  a  "dimension." 

As  with  the  Behavioral  Incident  exercise,  SMEs  had  ideas  on 
how  to  improve  the  Task-Based  exercise.  Several  experts  wanted 
finer  gradations  of  the  percent  GO  scale,  particularly  at  the 
top.  For  many  dimensions,  the  Outstanding  cutoffs  were  set  at 
95%  correct,  which  resulted  in  classifying  anywhere  from  13%  for 
Dimension  D  (Vehicle  and  Equipment  Operations)  to  36%  for 
Dimension  N  (Crew-served  Weapons)  of  the  examinees  as 
Outstanding.  Many  SMEs  felt  that  grouping  as  many  as  one-fourth 
of  the  soldiers  into  an  Outstanding  category  reduced  the  prestige 
of  an  Outstanding  rating  and  made  the  standards  too  lenient. 

Some  SMEs  expressed  a  desire  for  more  information  on  the  way 
the  hands-on  tests  were  scored.  They  did  not  want  to  see  an 
actual  test;  they  merely  wanted  to  know  more  about  the  scoring 
system.  They  are  familiar  with  performance  test  scoring  whereby 
each  step  is  scored  dichotomously  (GO  vs.  NO-GO)  yet  the  final 
score  is  a  dichotomous  GO  or  NO-GO  depending  on  whether  all  steps 
were  performed  correctly  and  in  the  proper  sequence.  In  many 
Army  performance  tests,  a  NO-GO  score  on  a  single  step  results  in 
a  failing  score  (i.e.,  a  final  NO-GO  score)  regardless  of  the 
number  of  steps  performed  correctly.  Many  SMEs  assumed  this  type 
of  scoring  system  for  the  hands-on  tests.  Some  assumed  their  own 
scoring  system  in  which  a  soldier  received  an  overall  GO  even  if 
he  or  she  performed  some  steps  incorrectly  or  out  of  sequence  but 
the  final  product  was  acceptable.  SMEs  who  admitted  assuming  the 
latter  scoring  system  (i.e.,  a  more  lenient  system)  set  stricter 
standards  than  those  who  admitted  assuming  the  relatively  strict 
Army  scoring  system.  It  may  seem  obvious  to  researchers  that  the 
percent  GO  scores  represent  the  percentage  of  steps  performed 
correctly  as  opposed  to  a  single,  final  dichotomous  score. 

However,  this  apparently  was  not  obvious  to  the  standard  setting 
judges.  A  related  version  of  SME  uncertainty  with  the  scoring 
system  concerned  the  amount  of  detail  that  was  scored.  SMEs  who 
expressed  the  assumption  that  the  test  probably  included  scoring 
of  trivial  steps  tended  to  argue  for  lower  cutoffs.  If  the  Task- 
Based  exercise  is  operationally  used  as  a  standard  setting 
procedure,  details  regarding  the  scoring  system  for  the  hands-on 
tests  may  need  to  be  included  in  the  instructions. 
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SMEs  felt  that  the  emphasis  on  the  fact  that  examinees  had 
no  time  to  prepare  for  the  hands-on  tests  should  be  maintained  or 
even  strengthened.  In  the  Army,  test  dates  are  scheduled  well  in 
advance  thereby  allowing  ample  opportunity  for  the  soldier  to 
study  and  prepare  for  that  test.  The  hands-on  tests  were  unique 
in  that  soldiers  received  no  forewarning  as  to  the  nature  of  the 
tests,  the  tasks  to  be  tested,  etc.  Because  our  unscheduled 
hands-on  tests  deviated  from  standard  Army  practice,  SMEs  felt 
that  this  point  should  be  well-articulated. 

Some  experts  wanted  more  information  about  the  normative 
population,  specifically  they  wanted  to  know  how  the  distribution 
of  scores  would  look  for  only  their  MOS .  For  MOS-specific 
dimensions,  several  experts  wanted  to  set  strict  standards  if  the 
data  were  obtained  from  soldiers  in  an  MOS  that  does  not  normally 
perform  tasks  in  that  dimension. 

Data  Editing 

As  noted  above,  all  of  the  Task-Based  Standard  Setting  Form 
responses  from  the  first  two  data  collection  sites  were 
eliminated  from  the  analyses.  In  addition,  questionnaire 
responses  were  screened  to  insure  that  three  cutoff  points  were 
indicated.  If  three  lines  were  not  drawn,  we  could  not 
unequivocally  match  lines  to  divisions  between  performance 
levels.  Table  5.13  indicates  the  number  of  rating  sheets  dropped 
from  our  analyses. 


Table  5.13 


Number  of  Task-Based  Standard  Setting  Forms  Dropped  During  Edits 


Dimension 

Number  of  Forms 
Initial  Rating 

Dropped 

Rerating 

B. 

Electrical  &  Electronic  Main't 

2 

4 

D. 

Vehicle  &  Equipt.  Operations 

12 

10 

H. 

Clerical 

18 

2 

I  . 

Communications 

4 

2 

M. 

Individual  Combat 

13 

22 

N. 

Crew-Served  Weapons 

5 

- 
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Descriptive  Statistics 


Tables  5.14  and  5.15  present  means  and  standard  deviations 
across  raters  for  the  standards  set  by  each  MOS,  by  each 
dimension,  for  both  initial  rating  (called  Trial  1  ratings)  and 
rerating  (called  Trial  2  ratings).  Table  5.14  presents  standards 
in  terms  of  percent  GO  test  scores.  Table  5.15  presents  the 
standards  in  terms  of  percentile  scores.  As  with  the  Behavioral 
Incident  data,  the  standard  deviations  in  these  tables  are  one 
index  of  rater  agreement . 

Because  the  percentile  metric  is  the  primary  index  for 
linking  performance  standards  to  selection  standards,  the 
remaining  analyses  focus  on  this  metric.  Table  5.16  presents  a 
summary  across  MOS  of  the  within-MOS  percentile  means  (i.e.,  the 
standards)  and  standard  deviations  for  each  dimension.  For 
example,  four  MOS  rated  Dimension  B  (Electrical  and  Electronic 
Maintenance).  The  mean  cutoff  for  Marginal  performance  indicates 
that  18.85%  of  all  soldiers  may  be  expected  to  fall  below 
Marginal.  The  average  within-MOS  standard  deviations  for  that 
cutoff  was  14.22.  Only  two  MOS  rerated  Dimension  B.  For  the 
Dimension  B  rerate,  the  average  Marginal  cutoff  was  18.62,  and 
the  average  within-MOS  standard  deviation  was  10.19.  At  the 
bottom  of  the  table,  the  means  for  these  within-MOS  statistics 
are  presented  averaged  across  all  MOS-by-dimension  combinations. 
Initial  rating  means  are  presented  for  the  full  set  of  data  and 
for  the  MOS-by-dimension  cases  that  match  the  rerate  data  set. 

Examining  these  data  suggests  that  neither  standard  levels 
(i.e.,  within-MOS  means)  nor  rater  agreement  (i.e.,  within-MOS 
standard  deviations)  change  greatly  from  the  initial  rating  to 
the  rerating.  Table  5.17  presents  tests  of  the  differences 
between  initial  rating  and  rerating  standards  and  rater  agreement 
for  Marginal,  Acceptable,  and  Outstanding.  Tests  were  conducted 
using  a  repeated  measures  ANOVA  with  the  22  MOS-by-dimension 
combinations  as  cases  and  the  within-MOS  statistics  as  the 
repeated  "trials."  Results  indicate  that  the  discussion  and 
rerating  process  appears  to  influence  the  cutoffs  for  all  three 
levels  of  standards  but  it  only  affects  rater  agreement  for  the 
Acceptable  cutoff  level.  Closer  inspection  of  the  data  at  the 
bottom  of  Table  5.16  indicates  that  the  rerating  process  leads  to 
standards  that  are  about  two  points  higher  at  each  cutoff.  As 
with  the  Behavioral  Incident  Questionnaire,  standard  deviations, 
when  converted  to  standard  errors  of  the  means,  suggest  reliable 
ratings.  Assuming  60  raters,  standard  errors  of  the  mean  are 
estimated  at  1.4  to  1.9  for  the  across  dimensions  data  reported 
at  the  bottom  of  Table  5.16. 
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Standard  Setting  Form:  Minimum  Cutoff  Percent  Go  Scores  by  MOS,  Dimension 
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(table  continues) 


_ Trial  1 _ _  _ Trial  2 _ 

Dimen-  Marginal _ Acceptable  Outstanding  Marginal  Acceptable  Outstanding 

MOS  sion  Sample  X  SD  X  SD  X  SD  Sample  X  SD  X  SD  X  SD 
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-Based  Standard  Setting  Form:  Minimum  Cutoff  Percentile  Scores  at  each  Level  by  MOS 
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Table  5.17 


Planned  Comparison  for  Trial  Effects  on  MOS  Cut  Score  Means  and 
Variances  Using  Repeated  Measures  ANOVA 


Source 

SS 

df 

MS 

F 

2 

Trial  Effects  on 

Cut  Score 

Means : 

Marginal  Cut 

119.996 

1 

119.996 

16.223 

0.001 

Error 

155.333 

21 

7.397 

Acceptable  Cut 

221.076 

1 

221.076 

29.939 

0.000 

Error 

155.06 

21 

7.384 

Outstanding  Cut 

52.360 

1 

52.360 

20.197 

0.000 

Error 

54.442 

21 

2.592 

Trial  Effects  on  Cut  Score 
(within-MOS  variation) 

Standard 

Deviations : 

Marginal  Cut 

0.606 

1 

0.606  0.224 

0  41 

Error 

56.849 

21 

2.707 

Acceptable  Cut 

25.166 

1 

25.166  8.513 

0.008 

Error 

62.077 

21 

2.956 

Outstanding  Cut 

0.078 

1 

0.078  0.073 

0.790 

Error 

22.499 

21 

1.071 

Note .  Each  case  is  represented  by  an  MOS-by-Dimension;  means  and 
standard  deviations  for  ratings  within  each  MOS  are  the 
variables . 
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Similar  to  the  other  Phase  III  instruments,  reliability 
estimates  were  computed  for  each  rater  group  (e.g.,  TRADOC  NCOs, 
TRADOC  Officers),  for  rater  groups  combined  within  command  (e.g., 
TRADOC  total),  for  rater  groups  combined  across  commands  (e.g., 
combined  NCOs),  and  for  all  raters  across  rank  and  command.  The 
reliability  estimation  procedure  was  identical  to  that  used  for 
the  Army  Task  Questionnaire  and  Behavioral  Incident  Standard 
Setting  Questionnaire.  Separate  estimates  were  computed  for  each 
dimension  rated  within  each  MOS .  Table  5.18  presents  single¬ 
rater  and  overall  reliability  estimates  calculated  using 
percentile  scores.  Table  5.19  presents  means  for  the  single¬ 
rater  reliability  estimates,  first  by  dimension  across  the 
relevant  MOS,  and  then  across  all  MOS-by-dimension  combinations. 

Several  observations  may  be  made  about  these  reliability 
estimates,  assisted  again  with  a  series  of  ANOVA.  First,  using 
only  the  reliabilities  for  combined  ranks  and  commands  (i.e.,  the 
next  to  the  last  column  in  Table  5.18),  differences  in 
reliability  among  the  task  dimensions  and  between  initial  and 
rerating  sessions  were  compared.  Dimension  differences  in 
reliability  estimates  were  significant  (£4,37  =  4.49,  p  <  .01)  but 
repetition  differences  were  not  (Flt«0  =  1*32,  ns).  Dimensions 
differ  in  their  reliability  estimates  from  .70  for  Electrical  and 
Electronic  Maintenance  initial  ratings  to  .81  for  Individual 
Combat  initial  ratings  with  all  raters  combined.  Second,  using 
the  reliabilities  for  combined  commands,  differences  by  rank  were 
not  significant  (£2,110  =  0.10,  ns);  and  third,  using  the 
reliabilities  for  combined  ranks,  differences  by  command  were  not 
significant  (F1j88  =  0.46  ,  ns). 

Fourth,  the  reliability  estimates  of  the  combined  rater 
groups  show  no  decrement  from  the  reliability  estimates  for  the 
separate  groups.  For  example,  the  average  reliability  for  all 
TRADOC  raters  is  .81,  the  average  reliability  for  all  FORSCOM 
raters  is  .80,  and  the  reliability  across  all  raters  is  .80. 

Thus,  there  is  no  suggestion  that  the  rater  groups  are  providing 
significantly  different  cutoffs. 

Last  and  most  obvious,  these  single-rater  reliability 
estimates  are  quite  high.  Of  course,  the  rating  format  of 
drawing  lines  for  the  cutoffs  guarantees  at  least  ordinal 
agreement  among  the  raters.  Still,  using  .80  as  the 
representative  single-rater  estimate,  with  60  raters  the  overall 
reliability  would  be  .996.  Using  the  rerating  cutoff  averaged 
across  all  MOS-by-dimension  combinations,  the  standard  deviation 
across  cutoff  may  be  estimated  as  21.9.  Combining  those 
estimates  yields  an  overall  project'" on  for  standard  error  of 
measurement  of  1.4.  This  is  congruent  with  the  error  estimate 
based  on  within-MOS  standard  deviations  for  the  cutoff  levels 
(i.e.,  1.4  to  1.9). 
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Differences  Among  Standard  Setting  Cutoffs 


For  the  Task-Based  Standard  Setting  instrument,  there  are 
five  variables  that  can  potentially  impact  on  the  standards 
selected.  These  include  rater  group  differences  in  rank  and 
command,  differences  among  the  MOS,  differences  among  the 
dimensions,  and  differences  between  initial  ratings  versus  the 
reratings.  This  section  examines  these  potential  differences 
when  the  standards  are  expressed  in  terms  of  percentile  scores. 
The  analysis  procedure  was  similar  to  the  parallel  analysis 
conducted  for  the  Behavioral  Incident  Questionnaire.  That  is, 
each  rater-by-dimension-by-repetition  questionnaire  w*s  treated 
as  a  case  with  the  three  performance  levels  treated  as  three 
"trials"  in  a  repeated  measures  ANOVA.  The  variables  of 
interest— rank,  command,  MOS,  dimension,  and  repetition — were 
treated  as  grouping  factors.  Again,  the  between-sub jects  effects 
represent  differences  in  strictness  and  leniency  for  the  three 
cutoffs  combined.  The  within-sub jects  interactions  test  the 
consistency  of  those  effects  across  the  three  cutoff  points. 
Civilians  were  excluded  from  the  analysis. 

Table  5.20  presents  the  results  of  this  repeated  measures 
ANOVA.  Four  of  the  variables  (repetition,  MOS,  dimension,  and 
rank)  show  statistically  significant  between-sub jects  effects. 
Furthermore,  all  five  variables  show  statistically  significant 
within-sub jects  interactions.  Because  of  these  interactions , 
separate  ANOVAs  were  calculated  for  each  of  the  three  cutoff 
points.  These  results  are  presented  in  Table  5.21.  They 
indicate  that  repetition,  MOS,  dimension,  and  rank  effect  the 
standards  set  for  all  three  performance  levels.  Command,  on  the 
other  hand,  effects  cutoff  points  only  for  the  Marginal  and 
Acceptable  levels.  Multiple  Rs  for  these  levels  are  .32,  .29, 
and  .28  for  the  Marginal,  Acceptable,  and  Outstanding  levels, 
respectively.  Based  on  the  sums-of-squares,  the  dimension 
effects  appear  to  be  the  most  influential. 

Figures  5.6  through  5.15  graphically  depict  the  results 
presented  in  Tables  5.20  and  5.21.  Because  many  SMEs  chose  to 
think  about  their  ratings  only  in  terms  of  test  scores,  the 
figures  present  differences  on  test  scores  as  well  as  on 
percentile  scores.  In  each  figure,  standards  are  plotted  for  the 
indicated  variable  based  on  averages  across  the  remaining 
variables.  For  example,  MOS  standards  are  computed  using  all 
raters,  repetitions,  and  dimensions  relevant  to  the  MOS. 

Considering  these  differences  in  light  of  the  analogous 
differences  for  the  Behavioral  Incident  Questionnaire,  the  MOS 
and  dimension  differences  with  the  Task-Based  method  are  more 
apparent.  This  suggests  that  the  standards  derived  for  the  Task- 
Based  instrument  may  be  less  generalizable  than  those  from  the 
Behavioral  Incident  Questionnaire. 
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Table  5.20 

Rating  Repetitions,  MOS,  Task  Dimension,  Rater  Rank,  and  Rater 
Command  Effects  on  Three  Levels  of  Task-Based  Standards 


SOURCE 

SS 

df 

MS 

P 

£ 

Between  Subjects 
Effects f 


Repetition 

6692.904 

1 

6692.904 

16.450 

0.000 

MOS 

20430.322 

10 

2043.032 

5.022 

0.000 

Dimension 

1614.19.533 

5 

32283.907 

79.350 

0.000 

Rank 

24134.890 

1 

24134.890 

59.321 

0.000 

Command 

1158.263 

1 

1158.263 

2.847 

0.092 

Subjects  w.  groups 

885723.180 

2177 

406.855 

Within  Subjects 
Effects: 

Level 

1027665.641 

2 

513832.821 

8856.447 

0.000 

Repetition  X  Level 

382.187 

2 

191.094 

3.294 

0.033 

MOS 

10452.031 

20 

522.602 

9.008 

0.000 

Dimension 

33624.534 

10 

3362.453 

57.955 

0.000 

Rank 

703.673 

2 

351.837 

6.064 

0.008 

Command 

615.726 

2 

307.863 

5.306 

0.015 

Level  x  Subj. 

w.  groups 

252610.110 

4354 

58.018 
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Table  5.21 


Rating  Repetitions,  MOS,  Task  Dimension,  Rater  Rank,  and  Rater 
Command  Effects  on  Each  Level  of  Task-Based  Standards 


Source 

SS 

df 

MS 

P 

£ 

Marginal  Cutoff t 

Repetition 

1972.524 

1 

1972.524 

10.689 

0.001 

MOS 

16963.413 

10 

1696.341 

9.192 

0.000 

Dimension 

88327.072 

5 

17665.414 

95.727 

0.000 

Rank 

4628.751 

1 

4628.751 

25.083 

0.000 

Command 

927.558 

1 

927.558 

5.026 

0.025 

Error 

401742.260 

2177 

184.539 

Acceptable 

Cutoff i 

Repetition 

3874.983 

1 

3874.983 

19.695 

0.000 

MOS 

8108.024 

10 

810.802 

4.121 

0.000 

Dimension 

92687.551 

5 

18537.510 

94.220 

0.000 

Rank 

10085.702 

1 

10085.702 

51.262 

0.000 

Command 

846.076 

1 

846.076 

4.300 

0.038 

Error 

428317.892 

2177 

196.747 

Outstanding 

Cutoff t 

Repetition 

1227.585 

1 

1227.585 

8.669 

0.003 

MOS 

5810.916 

10 

581.092 

4.104 

0.000 

Dimension 

14029.444 

5 

2805.889 

19.815 

0.000 

Rank 

10124.110 

1 

10124.110 

71.496 

0.000 

Command 

0.355 

1 

0.355 

0.003 

0.960 

Error 

308273.137 

2177 

141.605 
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Figure  5.8.  Task-Based  percentile  cutoffs  for  each  task  dimension 
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Task  Dimension 

Figure  5.9.  Task-Based  test  score  cutoffs  for  each  task  dimension. 
(See  Table  5.1  for  Task  Dimension  names.) 
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Rating  Repetition 

Figure  5.10.  Task-Based  percentile  cutoffs  for  each  rating 
repetition. 
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Rating  Repetition 

Figure  5.11.  Task-Based  test  score  cutoffs  for  each  rating 
repetition . 
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Figure  5.12.  Task-Based  percentile  cutoffs  for  HCOs  and  Officers. 
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Figure  5.13.  Task-Based  test  score  cutoffs  for  HCOs  and  Officers. 
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Figure  5.14.  Task-Based  percentile  cutoffs  for  TRADOC  and  FORSCOH. 
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Figure  5.15.  Task-Based  test  score  cutoffs  for  TRADOC  and  FORSCQM. 
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Also  of  interest  is  the  difference  in  this  ordering  between 
percentile  cutoffs  and  test  score  cutoffs  seen  by  comparing 
Figures  5.8  and  5.9.  Because  of  the  differences  in  Project  A 
test  performance  distributions,  the  ordering  of  cutoff  levels  :s 
almost  reversed  among  the  dimensions.  Dimensions  B  (Electrical 
and  Electronic  Maintenance ) ,  M  (Individual  Combat),  and  N  (Crew- 
served  Weapons)  have  the  highest  cutoffs  in  terms  of  test  scores 
but  the  lowest  in  terms  of  percentile  cutoffs.  Clearly,  if  SMEs 
are  trying  to  express  the  need  for  higher  quality  soldiers  by 
focusing  on  test  score  cutoffs,  they  may  not  be  reaching  their 
objective.  On  the  other  hand,  the  information  at  hand  is 
incomplete  in  that  the  linkage  of  the  performance  distributions 
to  the  predictor  distributions  and  the  translation  of  performance 
cutoff  to  predictor  cutoffs  in  not  available.  Without  that 
information,  it  is  not  possible  to  definitively  compare  the 
leniency  in  standards  across  the  different  dimensions. 

Other  statistically  significant  differences  include  the 
initial  versus  rerate  differences  that  were  also  observed  in  the 
MOS  level  statistics.  Rerate  cutoffs  are  slightly  higher  than 
initial  cutoffs  although  the  differences  are  unimpressive.  There 
also  appear  to  be  observable  rank  differences  with  Officers  to  be 
the  most  strict  and  NCOs  the  most  lenient.  Finally,  as  suggested 
by  the  ANOVA  results,  the  command  differences  are  least 
remarkable. 

Of  the  above  differences  in  cutoffs  the  MOS  ana  task 
dimension  differences  are  the  only  ones  large  enough  to  be 
treated  as  different  in  practical  terms.  That  means  that  rater 
group  differences  can  be  ignored.  Thus,  selection  of  raters 
falls  back  to  the  constituency  issue  discussed  previously  in 
Chapter  3.  Selection  of  raters  should  primarily  be  driven  by 
insuring  representativeness  for  the  sake  of  representativeness. 

MOS  and  dimension  differences  in  standards  may  be  too  large 
to  ignore.  It  is  important  to  note  that  the  MOS  effects  are 
significant  independent  of  the  dimension  effects  and  vise  versa. 
On  the  other  hand,  dimensions  were  selected  for  MOS  and 
therefore,  by  design,  the  two  factors  are  confounded.  Because  of 
the  confounding,  general  linear  model  analyses  of  any  MOS-by- 
dimension  interaction  fail.  Table  5.22  presents  a  summary  of 
repeated  measures  ANOVA  results  conducted  "*pcrately  for  each 
task  dimension.  These  results  show  significant  MOS  main  effects 
and/or  significant  MOS-by-cutof f  level  interaction  effects  for 
each  dimension.  (MOS-by-dimension  means  are  presented  in  Table 
5.15.)  Taken  as  a  whole,  the  pattern  suggests  that  the  MOS  set 
different  levels  of  standards  and  that  these  differences  are  not 
just  a  function  of  the  dimensions  being  rated.  Rather,  the 
standards  set  are  a  complex  function  of  both  MOS  and  dimension. 
Thus,  standards  set  for  one  MOS  are  unrelated  to  standards  set 
for  other  MOS.  Furthermore,  standards  set  for  one  dimension  are 
unrelated  to  standards  set  for  other  dimensions. 
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Table  5.22 


Summary  of  the  Effects  of  Rating  Repetitions,  MOS,  Rater  Rank, 
and  Rater  Command  Effects  on  Task-Based  Standards  Separately  for 
each  Task  Dimension 


Dimension 

Between 

-Subjects 

Within 

-Subjects 

Repet 

MOS 

Rank 

Command 

Repet  MOS 

Rank  Command 

B 

X 

X 

D 

X 

X 

X 

X 

H 

X 

X 

I 

X 

X 

X 

X 

X  X 

M 

X 

X 

X 

X 

Note.  X  ■  Significant  effect  (p  <  .05). 


Dimension 

Number  of  MOS 

Number  of  Ratinq  Sheets 

B.  Electrical 

4 

175 

D.  Driving 

5 

480 

H.  Clerical 

4 

203 

I .  Communication 

5 

325 

M.  Ind.  Combat 

11 

952 

Comparison  of  Behavioral  Incident  and  Task-Based  Standards 

Table  5.23  presents  mean  standards  derived  from  the 
Behavioral  Incident  Standard  Setting  Questionnaire  and  the  Task- 
Based  Standard  Setting  Form.  The  means  were  calculated  from  the 
MOS-by-dimension  data  (n  «  30)  presented  in  Tables  5.4  and  5.14. 
The  Task-Based  means  were  calculated  from  rerate  standards  for 
those  MOS-by-dimension  combinations  with  rerate  data  and  initial 
standards  otherwise. 

The  means  across  MOS  and  task  dimension  in  Table  5.23 
indicate  that  the  two  standard  setting  instruments  are  not 
producing  the  same  results.  These  differences  were  tested  once 
again  using  a  repeated  measures  approach.  The  30  unique  MOS-by- 
dimension  combinations  were  treated  as  cases  and  the  cutoff 
values  as  repeated  measures  "trials."  In  this  case,  standard 
setting  method  (Behavioral  Incident  vs.  Task-Based)  and  level  of 
standard  (Marginal,  Acceptable,  and  Outstanding)  were  treated  as 
two  trials  factors  (i.e.,  partitions  for  the  within-subjects 
trials  variable).  MOS  and  dimension  were  treated  as  grouping 
factors  (i.e.,  between-sub jects  independent  variables).  Table 
5.24  presents  the  results.  Method  differences  are  substantiated 
with  Task-Based  standards  generally  higher  than  Behavioral 
Incident  standards.  This  conclusion  is  tempered  by  the 
significant  interactions  with  MOS  and  task  dimension.  The 
MOS-by-Method-by-Level  interaction  is  the  only  one  that  is  not 
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Table  5.23 


Mean  Standards  Set  by  Behavioral  Incident  and  Task-Based  Standard 
Setting  Instruments 


Standard  Setting  Instrument 

Marginal 

Acceptable 

Outstanding 

Behavioral  Incident 

6.03 

38.91 

77.57 

Task-Based 

29.00 

49.11 

80.17 

Table  5.24 

Tests  of  the  Differences  in  Standards  Set  by  Behavioral  Incident 
Versus  Ti.sk- Based  Questionnaires 


Source  iS  df  MS  F 


Between- Subjects 
Effects i 


MOS 

417.808 

10 

41.781 

4.519 

0.005 

Dimension 

1906.667 

5 

381.333 

41.246 

0.000 

Subject  w.  groups 

129.435 

14 

9.245 

HI  thin- Subjects 
Effects t 

Method 

2908.632 

1 

2908.632 

33C.257 

0.000 

MOS  x  Method 

305.607 

10 

30.561 

3.470 

0.017 

Dimension  x  Method 

803.222 

5 

160.644 

18.240 

0.000 

Method  x  Subjects 

w.  groups 

123.301 

14 

8.807 

Level 

58159.617 

2 

29079.808 

25728.220 

0.000 

MOS  x  Level 

220.991 

20 

11.050 

9.776 

0.000 

Distension  x  Level 

139.798 

10 

13.980 

12.369 

0.000 

Level  x  Subjects 

w.  groups 

31.648 

28 

1.130 

Method  x  Level 

1807.062 

2 

903.531 

456.794 

0.000 

MOS  x  Meth  x  Level 

48.513 

20 

2.426 

1.226 

0.304 

Die  x  Meth  x  Level 

448.238 

10 

44.824 

22.661 

0.000 

Meth  x  Level  x 

Subjects  v.  groups 

55.384 

28 

1.978 
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significant.  The  interactions  indicate  that  standard  setting 
levels  are  influenced  by  a  complex  mix  of  the  methods  of  rating 
and  the  MOS  and  task  dimensions  being  rated. 

Differences  between  the  standard  setting  methods  with 
respect  to  within-MOS  rater  agreements  (i.e.,  within-MOS  standard 
deviations)  were  also  examined.  Table  5.25  presents  the  average 
within-MOS  standard  deviation  for  each  instrument  for  each 
performance  level  cutoff.  Given  that  lower  standard  deviations 
indicate  higher  rater  agreement,  the  Behavioral  Incident  method 
appears  to  produce  greater  agreement.  The  repeated  measures 
results  presented  in  Table  5.26  support  those  conclusions  with  a 
significant  main  effect  for  method.  The  level  main  effects  and 
interactions  also  suggest  that  the  raters  have  greater  agreement 
for  some  levels  of  performance  than  for  other  levels . 

Army  Task  Questionnaire  Difficulty 


The  Difficulty  scale  from  the  Army  Task  Questionnaire  was 
explored  for  its  relevance  to  the  standard  setting  problem.  In 
providing  Difficulty  ratings,  SMEs  were  asked  to  consider  how 
long  it  takes  to  learn  MOS  relevant  tasks  within  each  task 
category  and  how  often  those  tasks  must  be  practiced  in  order  to 
be  retained.  MOS  with  more  difficult  tasks  may  require  higher 
ability  soldiers.  Thus,  one  might  expect  to  find  a  positive 
relationship  between  task  difficulty  and  standards,  opening  the 
potential  for  difficulty  ratings  to  be  surrogates  for  standards. 
Alternatively,  baaed  on  SME  comments  during  the  Task-Based 
discussion  sessions,  one  could  anticipate  that  SMEs  would  be  more 
lenient  in  the  standards  they  set  for  tasks  that  are  more 
difficult.  This  would  lead  to  a  negative  relationship  between 
difficulty  and  standards. 

To  examine  the  relationship  between  difficulty  and 
standards,  difficulty  values  were  created  for  each  MOS  for  the 
relevant  task  dimensions.  This  required  consolidating  MOS  mean 
values  on  the  96  task  categories  into  values  for  the  17  lettered 
dimensions  that  subsume  the  categories  (see  Figure  3.1).  A 
procedure  was  applied  to  the  MOS  difficulty  mean  profiles  that 
first  defined  as  relevant  only  task  categories  with  mean  Core 
Technical  or  General  Soldiering  Importance  ratings  greater  than 
or  equal  to  3.0.  Then,  from  those  relevant  task  categories 
within  each  dimension,  the  maximum  difficulty  value  was  selected 
to  represent  the  difficulty  of  that  dimension.  Because  all  MOS 
set  standards  on  Task  Dimension  M  (Individual  Combat)  a 
difficulty  value  for  Dimension  M  was  created  for  all  KOS 
regardless  of  Core  Technical  or  General  Soldiering  Importance 
ratings.  Dimensions  with  no  relevant  task  categories  for  a 
particular  MOS  were  given  no  difficulty  values.  The  resulting 
difficulty  values  for  all  dimensions  are  presented  in  Table  5.27. 


Table  5.25 

Rater  Agreement  (Within-MOS  Standard  Deviations  Across  Raters) 
for  Behavioral  Incident  and  Task-Based  Standard  Setting 
Instruments 


Standard  Setting  Instrument 

Marginal 

Acceptable 

Outstanding 

Behavioral  Incident 

7.51 

8.00 

12.31 

Task-Based 

13.18 

14.06 

12.79 

Table  5.26 

Tests  of  the  Differences  in  Rater  Agreement  for  Behavioral 
Incident  Versus  Task-Based  Instruments 


Source  SS  df  MS  F  o 


Between- Subjects 
Effects t 


MOS 

160.676 

10 

16.068 

8.231 

0.000 

Dimension 

.841 

5 

36.368 

18.630 

0.000 

Subject  w.  groups 

330 

14 

1.952 

Within- Subjects 
Effects t 


Method 

483.924 

1 

483.924 

48.320 

0.000 

MOS  x  Method 

80.520 

10 

8.052 

0.804 

0.629 

Dimension  x  Method 

123.532 

5 

24.706 

2.467 

0.084 

Method  x  Subjects 

w.  groups 

140.211 

14 

10.015 

Level 

138.827 

2 

69.413 

38.139 

0.000 

MOS  x  Level 

95.493 

20 

4.775 

2.623 

0.009 

Dimension  x  Level 

260.693 

10 

26.069 

14.324 

0.000 

Level  x  Subjects 

v.  groups 

50.960 

28 

1.820 

Method  x  Level 

87.327 

2 

43.664 

26.110 

0.000 

MOS  x  Meth  x  Level 

67.620 

20 

3.381 

2.022 

0.043 

Dim  x  Meth  x  Level 

134.413 

10 

13.441 

8.038 

0.000 

Meth  x  Level  x 

Subjects  w.  groups 

46.825 

28 

1.672 
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Task  Pi Mansion  Difficulty  Values  for  Bach  NOS 
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These  difficulty  values  were  compared  to  the  Behavioral 
Incident  and  Task-Based  percentile  standards  by  correlating 
difficulty  values  and  standards  across  the  30  MOS-by-dimension 
combinations.  Because  of  the  inverted  relationship  between  Task- 
Based  percentile  standards  and  Task-Based  percent  GO  test  score 
standards  that  appears  in  Figures  5.8  and  5.9,  Task-Based  percent 
GO  test  score  standards  were  included  in  the  present  comparison. 
These  correlations  are  presented  in  Table  5.28. 


Table  5.28 

Correlations  between  Behavioral  Incident  Cutoffs,  Task-Based 
Cutoffs,  and  Army  Task  Questionnaire  Task  Dimension  Difficulty 
Across  30  MOS-by-Dimension  Combinations 


Task- 

Based 

Behavioral  Incident 

Percentile 

Score 

Percent 

Test 

Score 

Marg 

Acc 

Out 

Marg 

Acc 

Out 

Marg 

Acc 

Out 

Behavior 

Incident: 

Marginal 

1.00 

Acceptable 

.26 

1.00 

Outitanding 

.10 

.76 

1.00 

Task-  Based , 
Percentile: 

Marginal 

.07 

.63 

.57 

1.00 

Acceptable 

.09 

.59 

.57 

.88 

1.00 

Outstanding 

.13 

.50 

.56 

.63 

83 

1.00 

Task-Based, 
Score  Cutoffs 

; 

Marginal 

.43 

-.11 

-.15 

-.09 

-.20 

-.05 

1.00 

Acceptable 

.35 

-.15 

-.06 

-.16 

-.26 

-.04 

.92  1 

.00 

Outstanding 

.33 

-.07 

.12 

-.20 

-.21 

.19 

.69 

.85 

1.00 

Army  Task 
Questionnaire 

t 

Difficulty 

.12 

-.30 

-.02 

-.28 

-.41 

-.23 

.45 

.54 

.56 
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The  correlations  may  seem  to  present  a  contradictory 
picture.  There  is  no  consistent  relationship  between  Behavioral 
Incident  standards  and  dimension  difficulty.  On  the  other  hand, 
difficulty  is  related  to  Task-Based  standards,  but  in  an 
interesting  pattern  that  is  apparently  related  to  the  inverse 
relationship  between  percent  GO  test  score  standards  and 
percentile  standards.  For  the  set  of  MOS  and  dimensions  at  hand, 
dimensions  with  more  difficult  tasks  are  given  higher  test  score 
(criterion-referenced)  standards,  but  when  those  test  scores  are 
translated  into  percentile  (norm-referenced)  standards,  we  see 
the  reverse.  A  fairly  straightforward  explanation  can  be  offered 
for  these  results.  Recall  from  Chapter  3  that  Difficulty  is 
positively  correlated  with  Frequency  and  Importance.  Given  that, 
one  may  assert  that  the  difficult  task  categories  tend  to  be  more 
important  and  more  frequently  performed  or  practiced.  During  the 
Task-Based  discussions,  SMEs  indicated  that  soldiers  should  be 
expected  to  have  higher  performance  on  important  and/or 
frequently  performed  tasks.  Therefore,  we  observe  positive 
correlations  between  difficulty  and  percent  GO  test  score 
standards.  Because  these  tasks  are  frequently  performed, 
soldiers  are  better  able  to  perform  them.  As  a  consequence,  more 
soldiers  score  high  on  tests  of  such  tasks;  therefore,  when  task 
standards  are  expressed  as  percentile  scores,  cutoffs  appear 
lower . 

Notice  that  for  the  set  of  MOS-by-dimensions  examined,  there 
is  a  slight  negative  relationship  between  Task-Based  percent  GO 
test  score  standards  and  percentile  standards  (e.g.,  -.26  for  the 
Marginal  cutoffs).  Given  the  explanation  above,  one  could  expect 
differences  in  dimension  difficulty  to  account  for  that 
relationship.  Indeed,  holding  Difficulty  constant,  the  partial 
correlation  between  test  score  standards  and  percentile  standards 
is  -.02  supporting  the  argument  that  differences  in  Difficulty 
(and  Importance  and  Frequency)  lead  to  both  higher  test  score 
standards  and  lower  percentile  standards. 

Table  5.28  also  provides  further  evidence  that  the 
Behavioral  Incident  Questionnaire  and  the  Task-Based  Standard 
Setting  Form  are  not  entirely  convergent.  The  standards  provided 
by  the  two  standard  setting  instruments  correlate  .07,  .59,  and 
.56,  respectively,  for  the  three  levels  of  performance.  Again, 
the  two  instruments  lead  to  different  sets  of  standards. 

Task  Complexity  Questionnaire 

In  Phase  III,  we  also  briefly  explored  a  second  approach  of 
using  task  difficulty  for  determining  selection  requirements. 

This  approach  is  motivated  by  the  general  lack  of  differential 
validity  in  the  prediction  equations  for  different  jobs.  The 
assumption  in  this  approach  is  that  a  relatively  univariate 
conception  of  "task  complexity"  could  be  established  and  that  a 
measure  of  task  complexity  could  be  linked  directly  to  aptitude 
requirement  levels. 
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We  developed  a  prototype  Task  Complexity  Questionnaire  to 
provide  an  indicator  of  cognitive  ability  requirement  for  a  given 
task.  The  questionnaire  was  adapted  from  a  model  for  predicting 
skill  retention  in  task  performance  (Rose  et  al.,  1984).  The 
model  consists  of  a  systematic  way  of  predicting  the  decay  of 
performance  in  a  single  task  so  that  training  can  be  scheduled  to 
restore  acceptable  task  performance.  Factors  that  contribute  to 
defining  the  retention  of  task  performance  include  the  number  of 
steps,  presence  of  job  aids,  and  the  cognitive  demands  on  the 
soldier.  Each  of  these  factors  is  posed  to  SMEs  in  the  form  of  a 
question  and  the  response  options  are  weighted  in  terms  of  how 
likely  task  performance  will  be  maintained.  Based  on  the  skill 
retention  model,  there  are  10  factors  that  affect  how  well  task 
performance  will  be  retained. 

We  adapted  these  10  factors  in  the  form  of  the  prototype 
Task  Complexity  questionnaire  retaining  the  questions  and 
response  options  for  the  most  part.  We  asked  our  SMEs  those  10 
questions  about  particular  tasks.  An  example  of  the 
questionnaire  is  found  in  Appendix  A  (pp.  A-36  to  A-40).  It  is 
important  to  note  that  our  Task  Complexity  Questionnaire  approach 
circumvents  some  of  detail  that  is  present  in  the  original  skill 
retention  model  data  collection  protocol  where  a  group  of  SMEs 
provides  a  response  to  a  question  after  they  were  explained  the 
question  and  had  an  opportunity  for  discussion  among  themselves. 
Moreover,  in  the  Task  Complexity  questionnaire,  each  SME  was 
asked  to  provide  complexity  ratings  on  two  tasks,  a  common  task 
and  a  job  specific  task.  Each  job  specific  task  was  taken  from 
one  of  the  task  dimensions  used  in  the  standard  setting  exercises 
(e.g.,  Vehicle  and  Equipment  Operations). 

Analyses  of  the  Task  Complexity  questionnaire  attempted  to 
answer  a  basic  question:  Can  judges  agree  on  the  demands  and 
complexity  of  a  task  using  a  simplified  rendering  of  the  skill 
retention  model? 

Scoring .  We  used  a  simple  scoring  procedure  for  each  of  the 
10  items.  In  all  the  questions,  the  response  options  were  scaled 
from  difficult  to  easy.  Although  the  original  skills  retention 
model  scoring  applied  differential  weights  to  the  response 
options,  we  used  a  simple  scoring  procedure  where  if  there  were 
three  response  options,  we  used  Is,  2s,  3s,  and  so  forth  to  score 
responses  from  least  to  most  complex.  We  computed  a  total  score 
based  on  the  sum  of  the  10  item  responses. 

Judge  Agreement.  Our  early  analyses  showed  that,  for  a 
selected  job  task,  judges  did  not  agree  on  the  best  option  for 
each  of  the  10  complexity  measures.  The  relatively  large 
standard  deviation  for  each  item  in  a  job  indicated  lack  of 
agreement  among  the  judges  (see  Table  5.29).  These  results  in 
Table  5.29  typified  the  results  that  were  obtained  for  other 
tasks  (see  Appendix  G). 
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The  task  complexity  model  approach,  was  not  convincing. 

Given  the  very  rough  nature  of  the  tryout  of  this  approach,  we 
cannot  conclude  that  it  will  not  work.  There  is  considerable 
appeal  for  an  underlying  model  that  task  complexity  is  related  to 
ability  requirements.  However,  much  more  effort  is  needed  to 
develop  both  measures  of  complexity  and  indicators  of  ability 
requirements  before  a  better  test  of  this  approach  can  be 
conducted . 


Table  S.29 


Task  Complexity  Questionnaire  Item  Means  and  Standard  Deviations 
for  an  Electrical  and  Electronic  Systems  Maintenance  Task 


MOS 

27E 

29E 

Task  Complexity  Items 

MEAN 

S.D. 

MEAN 

S.D. 

1 .  Are  job  or  memory  aids  used? 

1.28 

0.46 

1.17 

0.38 

2.  Quality  of  job  aids. 

3.84 

0.90 

3.76 

0.88 

3.  How  many  steps  are  task 
divided? 

2.72 

0.68 

2.86 

0.65 

4.  Steps  performed  in  definite 
sequence? 

3.24 

0.44 

3.07 

0.38 

5.  Built-in  feedback? 

2.84 

0.80 

3.04 

0.96 

6.  Timo  limit  for  completion? 

1.20 

0.41 

1.50 

0.58 

7.  Mental  processing 
requirements? 

1.84 

0.62 

2.04 

0.58 

8.  Number  of  facts,  terms,  etc. 
memorize? 

2.08 

0.86 

2.46 

1.04 

9.  How  hard  are  the  facts  or 
terms? 

2.08 

0.28 

2.25 

0.52 

10.  What  are  the  motor  control 
demands? 

1.92 

0.64 

2.32 

0.61 
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The  general  objective  of  this  chapter  was  to  describe  two 
potential  instruments  for  setting  performance  standards  which  can 
in  turn  be  used  to  evaluate  selection  standards.  Specific 
objectives  were  to  present  qualitative  feedback  obtained  from 
workshop  participants,  present  reliability  estimates,  and  present 
obtained  standards  with  a  description  of  some  of  the  influences 
on  those  standards.  In  addition,  the  Army  Task  Questionnaire 
Difficulty  scale  and  the  Task  Complexity  Questionnaire  were 
examined  for  their  relevance  to  the  standard  setting  problem. 

The  qualitative  feedback  reinforces  the  position  that 
standard  setting  by  its  nature  is  complex,  subjective,  and 
ambiguous.  By  requiring  SME  to  work  in  the  abstract,  instead  of 
with  a  specific  test,  the  Behavioral  Incident  and  Task-Based 
instruments  increase  the  complexity  and  ambiguity.  Because  of 
the  discussion  sessions,  we  have  been  able  to  describe  rather 
thoroughly  SME  thought  processes  regarding  completion  of  the 
Task-Based  exercise.  They  presented  a  variety  of  thoughts  and 
strategies .  There  is  no  reason  to  assume  that  SMEs  were  not 
equally  diverse  in  their  approaches  to  the  Behavioral  Incident 
judgments . 

In  spite  of  this  variation,  it  is  possible  to  pool  SME 
judgments  into  stable  performance  standards.  Because  the 
instruments  are  not  yet  ready  for  operational  use,  there  is  no 
need  to  give  precise  estimates  of  reliability  nor  to  give 
requirements  for  the  size  of  the  rater  pool.  At  this  point,  it 
is  sufficient  to  note  that  obtaining  acceptable  reliability  does 
not  appear  to  present  any  problem.  Furthermore,  differences 
between  rater  groups  appear  minimal  (in  the  case  of  the 
Behavioral  Incident  Questionnaire)  or  non-existent  (in  the  case 
of  the  Task-Based  Form).  As  with  the  Army  Task  Questionnaire, 
selection  of  raters  may  be  driven  more  by  political  concerns  than 
by  psychometric  requirements. 

Differences  in  Methods 

The  most  noticeable  difference  among  the  standards  occurs 
between  methods,  particularly  for  the  lower  level  cutoffs  where 
Task-Based  standards  appear  much  more  strict  for  Marginal 
performance.  Certainly,  the  standard  setting  problem  would  be 
simpler  if  the  two  methods  converged.  They  do  not,  and  perhaps 
an  explanation  can  be  gleaned  from  the  SME  comments.  To  lay  some 
background,  one  should  rocall  that  the  two  standard  setting 
methods  were  derived  from  two  different  Project  A  performance 
measurement  methods— ratings  and  performance  tests.  In  the 
Project  A  performance  model  (Campbell,  McHenry,  &  Wise,  1990), 
ratings  and  performance  tests  are  associated  with  two  different 
domains.  The  performance  tests,  whicn  are  the  reverent  for  the 
Task-Based  exercise,  are  clearly  associated  with  the  Core 
Technical  and  General  Soldiering  components  which  are  skills 
related.  The  skill  components  are  sometimes  referred  to  as 


"maximal'  or  "can  do"  performance  and  indicate  what  a  soldier  is 
able  to  do  at  a  particular  point  in  time.2  On  the  other  hand, 
the  ratings,  from  which  the  Behavioral  Incident  Questionnaire 
were  derived,  are  more  closely  associated  with  the  Effort  and 
Leadership  component.  This  component  is  interpreted  as 
indicative  of  "typical"  or  "will  do"  performance. 

Based  on  SME  comments  documented  in  the  Qualitative  Feedback 
sections  of  this  chapter,  there  appear  to  be  differences  in  the 
initial  dispositions  of  SME  approaches  to  the  two  methods.  The 
Behavioral  Incident  judgments  asked  SMEs  to  respond  to  the 
following  questions 

If  a  soldier  CONSISTENTLY  performed  duties  in  this  area 

at  a  level  of  effectiveness  like  the  example  incident, 

what  kind  of  soldier  would  thi3  be? 

This  f ocus  on  persons,  rather  than  on  test  scores,  may  have 
urtated  a  leniency  in  the  "will  do"  arena.  During  the  workshops, 
the  notion  of  consistent  performance  had  to  be  emphasized 
repeatedly  by  the  instructions  and  workshop  leaders  to  counter 
SME  tendencies  to  give  soldiars  described  by  poor  performance 
incidents  "the  benefit  of  the  doubt."  SME  attributions  seemed  to 
be  that  the  soldiers  could  have  performed  correctly,  they  just 
didn't  happen  to  do  so  in  the  incident. 

In  contrast,  the  Task-Based  method  focused  SME  attention  on 
task  performance  and  away  from  individual  persons.  The  judgment 
implied  that  SMEs  assume  a  test  situation  in  which  soldiers 
should  be  showing  their  best  performance.  SME  expectations, 
undoubtedly  reinforced  by  typical  Army  training  standards,  were 
that  if  a  task  was  important  enough  to  test,  soldiers  ought  to  be 
able  to  perform  it  reasonably  well.  We  have  consistently  seen 
the  expectation  that  at  least  60  to  70%  of  the  steps  in  the  task 
should  bo  performed  correctly.  Even  if  the  test  includes  scoring 
of  trivial  steps,  performing  less  than  60  to  70%  on  any  task 
means  that  a  large  proportion  of  the  task  was  not  performed  or 
not  performed  correctly.  Performance  below  that  level  is 
interpreted  to  mean  that  a  soldier  cannot  (as  opposed  to  did  not) 
perform  the  task.  Scores  lower  than  60  to  70%  correct  are  simply 
not  acceptable  by  any  standard.  This  opinion  holds  up  even  after 
the  implication  that  nearly  30%  of  the  soldiers  would  be  deemed 
Unacceptable  is  made  clear  during  the  discussion  sessions.  In 
fact,  SMEs  became  slightly  more  strict  on  the  rerating.  They 
certainly  did  not  lower  their  test  performance  expectations  in 
light  of  the  distributions. 


^hese  are  •"aximal  in  the  sense  that  performance  on  these 
tests  is  taken  as  maximal  for  a  given  point  in  time  given  a 
soldier's  abilities  and  experiences.  They  are  not  necessarily 
"maximal"  in  the  sense  that  aptitude  tests  are  described. 


A 
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As  a  result,  we  appear  to  have  two  different  sets  of 
standards  each  of  which  is  logical  from  its  own  perspective.  How 
much  of  a  real  conflict  this  situation  presents  depends  on  the 
solution  to  the  linkage  of  performance  and  predictor  battery 
distributions.  Core  Technical  Performance,  General  Soldiering 
Performance,  and  Effort  and  Leadership  are  each  predicted  by 
(a)  cognitive  ability  and  (b)  a  temperament  composite  that 
includes  interests  and  job  reward  preferences  (McHenry,  Hough, 
Toquam,  Hanson,  &  Ashworth,  1990).  However,  the  balance  between 
the  two  predictors  and  overall  predictability  is  different  for 
the  two  skill  factors  versus  Effort  and  Leadership.  The  Effort 
and  Leadership  component  is  less  predictable  overall  (mean 
validity  for  nine  MOS  is  .44)  than  the  Core  Technical  and  General 
Soldiering  components  (mean  validities  for  nine  MOS  are  .65  and 
.69,  respectively).  For  Effort  and  Leadership,  prediction  from 
cognitive  ability  and  the  temperament  composite  are  nearly  equal, 
whereas  prediction  of  the  Core  Technical  and  General  Soldiering 
component:?  are  more  heavily  weighted  by  cognitive  ability.  Thus, 
what  the  differences  between  the  two  sets  of  performance 
standards  mean  for  evaluating  predictor  standards  is  clouded  by 
differences  in  the  way  predictor  domain  is  associated  with  the 
performance  components  most  closely  related  to  each  set  of 
standards.  In  addition,  there  may  be  regression  intercept 
differences  for  the  two  domains  that  nullify  (or  magnify) 
performance  standard  differences  when  translated  to  predictor 
differences . 

Task  Dimension  Differences 


Dimension  differences  are  statistically  significant  for  both 
standard  setting  methods  although  they  appear  more  pronounced  for 
the  Task-Based  standards.  On  the  one  hand,  this  seems  logical 
given  our  arguments  concerning  differences  in  frequency, 
difficulty,  and  standards.  That  is,  frequent  (and  difficult) 
dimensions  are  given  different  standards  than  infrequent  (and 
less  difficult)  dimensions.  On  the  other  hand,  these  differences 
in  dimension  standards  pose  a  problem  because  they  indicate  that 
standards  should  be  set  explicitly  for  each  relevant  dimension. 
Standards  for  one  dimension  cannot  be  generalized  to  other 
dimensions.  The  problem  is  that  Project  A  data  from  the  nine 
Batch  A  MOS  are  required  for  constructing  and  scoring  both  the 
Behavioral  Incident  and  Task-Based  instruments.  Given  the 
present  approach,  instruments  cannot  be  constructed  to  cover  task 
dimensions  that  are  not  part  of  those  nine  MOS.  For  the  Task- 
Based  instrument,  the  performance  distribution  differences 
between  dimensions  indicate  that  appropriate  performance  data  may 
be  required  for  each  dimension.  To  the  extent  that  performance 
distributions  for  any  given  dimension  are  similar  across  MOS, 
there  is  some  savings.  Not  all  MOS  would  have  to  be  studied; 
however,  MOS  would  have  to  be  sampled — with  performance  test 
written  and  administered — in  order  to  study  all  dimensions.  On 
the  other  hand,  we  have  not  established  that  performance 
distributions  are  similar  across  MOS. 
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The  problem  of  generalizing  standards  across  dimensions  is 
less  troublesome  for  the  Behavioral  Incident  approach.  First, 
the  dimension  differences,  although  statistically  significant, 
are  smaller  than  for  the  Task-Based  standards,  particularly  for 
the  Acceptable  and  Marginal  cutoffs.  Second,  we  have  shown  that 
the  Behavioral  Incident  approach  does  not  depend  on  performance 
rating  data  because  the  transformation  of  incident  scale  values 
to  percentile  equivalents  appears  to  hold  across  different 
dimensions.  On  the  other  hand,  the  approach  does  require 
incidents  that  have  been  scaled.  Thus,  unless  dimension 
differences  are  ignored,  dimensions  that  cannot  be  covered  by 
incidents  from  the  nine  Batch  MOS-specifir  BARS  scales  will  have 
to  be  constructed  from  newly  developed  and  scaled  incidents. 

Given  the  above  limitations  on  the  two  standard  setting 
methods,  perhaps  the  Army  Task  Questionnaire  Difficulty  scale  and 
the  Task  Complexity  Questionnaire  merit  a  second  look.  For 
example,  standards  might  be  set  for  dimensions  that  can  be 
constructed  from  Project  A  data.  Then,  based  on  relative 
differences  in  difficulty,  standards  may  be  estimated  for  missing 
dimensions  from  standards  set  for  the  existing  dimensions.  While 
not  the  best  solution,  it  may  be  the  most  cost  effective  because 
neither  additional  performance  tests  nor  rating  scales  would  have 
to  be  developed. 

Finally,  it  remains  to  be  seen  whether  any  of  these 
dimension  differences  make  any  real  difference.  In  the  previous 
chapter,  we  saw  that  MOS  level  differences  in  ability 
requirements  are  elusive  when  we  do  not  have  empirical  data  to 
capture  them.  Part  of  this  failure  to  discriminate  the  MOS  is 
due  to  the  multifaceted  nature  of  ail  of  the  MOS;  they  have  a  lot 
in  common.  Because  the  17  task  dimensions  are  organized  as 
multifaceted  task  components,  we  may  also  expect  many  of  the  task 
dimensions  to  share  overlapping  ability  requirements.  Further 
work  would  be  required  to  map  task  dimensions  to  ability 
requirements  and  make  projections  about  the  extent  to  which 
dimensions  are  unique  in  terms  of  those  abilities.  If  the 
different  task  dimensions  are  as  recalcitrant  in  yielding 
identifiable  ability  differences  as  the  MOS,  the  only  requirement 
for  standard  setting  may  be  to  set  standards  for  the  most 
important  dimension.  Mote  that  this  may  lead  to  higher  percent 
GO  test  score  standards  and  lower  percentile  standards  for  the 
more  important  (and  more  difficult)  dimensions,  but  they  are 
standards  that  are  "validated"  in  the  sense  that  current  training 
practices  demonstrate  achievable  proficiency  levels. 

In  summary,  we  have  been  able  to  explore  the  complexities  of 
attempting  to  set  standards  for  performance  that  are  independent 
of  particular  test  content.  Although  we  have  certainly  not 
solved  the  problem,  we  have  identified  some  of  the  parameters 
that  need  to  be  addressed  in  future  efforts,  and  we  have  provided 
some  guidance  concerning  issues  that  need  further  examination. 
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Chapter  6t  Linkage 
La ureas  L.  Wise  (AIR) 


The  ultimate  goal  of  the  present  research  on  procedures  for 
setting  performance  standards  is  to  provide  information  for 
setting  and  defending  selection  test  score  requirements. 

Synthetic  validation  procedures  can  identify  which  abilities  and 
other  attributes  are  most  predictive  of  success  on  the  job  and 
can  estimate  the  level  of  prediction  accuracy  that  can  be 
achieved  using  measures  of  these  attributes  as  predictors.  The 
final  step  in  developing  selection  procedures  for  specific  MOS  is 
to  determine  the  minimum  levels  of  these  abilities  that  should  be 
required.  Thus,  performance  standards  must  be  linked  with 
selection  standards . 

Issues  In  Linking  Selection  Standards  to  Job  Performance 

A  number  of  important  issues  must  be  addressed  in  developing 
procedures  for  setting  minimum  enlistment  standards.  These 
include i 


•  Identification  of  the  performance  dimensions )  for  which 
performance  standards  will  be  set, 

•  Determination  of  the  acceptability  of  different  levels  of 
performance,  end 

•  Identification  of  the  predictor  composite  that  will  be 
used  to  select  applicants  into  the  job, 

•  Selection  of  a  bivariate  model  for  deriving  an  expected 
distribution  of  job  performance  levels  from  any  given 
distribution  of  predictor  scores, 

•  Estimation  of  the  parameters  of  the  expectancy  model 
relating  job  performance  to  enlistment  test  scores, 

•  Determination  of  the  percent  of  failures  that  the  Army  is 
willing  to  tolerate. 

The  present  project  primarily  addressed  the  first  three 
Issues  in  the  above  list.  In  this  section  of  the  report,  we 
discuss  each  of  the  linkage  issues,  describing  efforts  in  the 
Synthetic  Validation  Project  as  well  as  in  other  projects  to 
develop  a  more  complete  linkage  between  selection  measures  and 
performance. 

Performance  Dimensions 


Results  from  Project  A  indicate  that  job  performance  can  be 
summarised  in  terms  of  five  dimensions  (Campbell,  McHenry,  & 
Wise,  1990).  These  aret  (a)  core  technical  proficiency. 
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(b)  general  soldiering  proficiency,  (c)  effort  and  peer 
leadership,  (d)  personal  discipline,  and  (e)  physical  fitness  and 
military  bearing.  The  Army  currently  uses  multiple  selection 
screens  and  these  screens  correspond,  approximately,  to  the 
multiple  performance  dimensions  defined  in  Project  A.  This 
relationship  is  shown  in  Figure  61.  Within  the  present  project, 
we  focused  on  setting  performance  standards  for  core  technical 
proficiency  and  then  relating  these  standards  to  cuxrent  and 
alternative  Aptitude  Area  composites.  As  described  in  Chapter  5, 
we  identified  a  number  of  specific  performance  dimensions  similar 
to  those  identified  in  creating  the  Project  A  MOS-specific  BARS 
and  set  standards  for  performance  on  each  of  these  dimensions. 

In  Phases  I  and  II  of  the  Army  Synthetic  Validity  Project,  we 
explored  the  relationship  of  standards  set  on  individual  job 
performance  dimensions  to  overall  performance  standards  and  found 
that  overall  acceptability  levels  could  be  approximated  quite 
closely  by  averaging  the  acceptability  of  performance  on  each  of 
the  more  detailed  dimensions. 

Performance  Acceptability  Levels 

As  described  in  Chapter  5,  we  defined  four  levels  of 
acceptability  for  use  in  setting  job  performance  standards.  We 
used  two  different  approaches,  Task-Based  and  Behavioral 
Incident,  to  describe  different  levels  of  performance  and  elicit 
judgments  regarding  the  acceptability  of  each  performance  level. 
The  resulting  judgments  were  converted  first  to  minimum 
acceptable  performance  scores  and  then  to  percentile  scores  as 
described  above. 

Predictor  Composite 

Identification  of  the  most  appropriate  composite  for 
predicting  core  technical  proficiency  was  the  focus  of  the 
synthetic  validation  portion  of  this  project.  The  results, 
described  in  Chapter  4,  indicate  that  highly  valid  predictor 
composites  can  be  produced  through  this  process,  although 
discrimination  among  jobs  is  minimal.  In  the  present  project,  we 
used  both  the  current  Aptitude  Area  composite  and  the  alternative 
composite  identified  through  synthetic  validation  as  the 
predictor  composites  of  interest.  The  present  Aptitude  Area 
Composites  are  scaled  to  have  a  mean  of  100  and  a  standard 
deviation  of  20  for  the  1980  Youth  Population.  We  are  assuming 
that  new  composites  would  be  placed  on  a  similar  scale. 

Selection  cut  scores  and  predictor  -core  distributions  for  the 
Project  A  samples  are  then  described  in  terms  of  this  metric. 

Bivariate  Model 


The  bivariate  model  that  we  developed  for  relating 
performance  levels  to  predictor  score  distributions  had  two 
components.  The  first  component  related  performance  levels  to  an 
underlying  continuous  performance  measure  by  assuming  a  set  of 
cut  scores  that  defined  minimum  acceptable  performance  levels 


Figure  6.1.  Matching  different  areas  of  performance  to  different 
selection  screens. 


along  this  continuous  dimension.  The  second  component  was  a 
model  for  relating  the  underlying  performance  scores  to 
enlistment  test  scores.  For  this  second  component,  we  chose  to 
follow  the  standard  relational  model  used  in  most  regression 
analyses.  This  model  assumes  that  the  predictor  and  criterion 
variables,  taken  together,  have  a  bivariate  normal  distribution. 
This  assumption  is  equivalent  to  three  important  conditions. 
First,  the  marginal  distribution  of  both  the  predictor  and 
criterion  measures  have  a  normal  distribution  (for  some  reference 
population).  Second,  the  relationship  of  the  criterion  to  the 
predictor  is  linear.  (The  function  giving  the  average  criterion 
score  for  all  individuals  with  the  same  predictor  score  will  be  a 
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linear  function  of  the  predictor  score.)  Finally,  the 
conditional  variance  of  the  criterion  variable  (the  variance 
about  the  prediction  regression  line)  is  constant  throughout  the 
range  of  the  predictor  variable.  Given  the  two  relational  models 
and  the  criterion  cut  scores,  performance  level  distributions 
(the  percent  performing  at  each  acceptability  level)  can  be 
estimated  for  any  given  predictor  score  level  and  also  for  any 
distribution  of  predictor  scores  through  numerical  integration 
techniques  similar  to  those  used  in  developing  the  Taylor-Russell 
tables  (Taylor  &  Russell,  1939). 

These  conditions  assumed  in  the  relational  model  are  met,  at 
least  approximately,  in  most  of  the  Project  A  data  samples.  The 
primary  exception  is  that  the  criterion  measures  are  sometimes 
slightly  skewed.  This  is  not  an  important  violation  of  these 
assumptions,  however.  The  job  performance  metric  used  is 
entirely  arbitrary,  dependent  on  the  particular  items/tasks 
selected,  the  difficulty  of  the  questions  or  performance  steps 
used  in  assessing  these  items /tasks,  and  the  severity  of  scoring. 
We  can  transform  the  performance  scale  so  that  the  sample  will 
have  a  normal  distribution  without  any  loss  of  generality. 

A  second  possible  exception  to  the  conditions  of  the 
bivariate  normal  model  is  that  measurement  error  may  not  be 
uniform  so  that  the  homogeneity  of  variance  assumption  does  not 
hold  exactly.  In  general,  regression  methods  are  relatively 
robust  to  violations  of  this  assumption.  In  the  present  case, 
significant  violation  of  this  assumption  could  lead  to  some  bias 
in  the  percentage  of  recruits  expected  to  pass  or  fail  a 
particular  performance  standard.  The  effects  of  such  violations 
would  be  more  significant  when  the  predictive  relationship  is 
weak  (so  that  the  error  variation  accounts  for  more  of  the  total 
variation  of  the  criterion  variable).  For  most  MOS,  the 
predictive  relationship  is  quite  strong,  again  arguing  that  the 
effects  of  violation  of  the  homogeneity  of  variance  assumptions 
will  be  minimal.  Nonetheless,  further  research  on  both  the 
linearity  and  homogeneity  of  variance  assumptions  might  be  useful 
before  the  proposed  linkage  system  is  used  operationally. 

Estimation  of  Parameters 


Estimation  of  the  parameters  of  the  relational  model  turns 
out  to  be  a  very  difficult  problem.  Project  A  provided  extensive 
data  on  both  performance  measures  and  a  wide  variety  of  potential 
predictor  measures,  but  these  data  are  limited  to  just  the  19  MOS 
in  the  Concurrent  Validation  phase  of  Project  A.  What  can  be 
done  about  all  of  the  other  (more  than  250)  MOS  for  which 
enlistment  standards  must  be  set? 

The  primary  approach  pursued  in  developing  parameter 
estimates  followed  the  general  approach  used  in  synthetic 
validation.  For  each  job,  core  technical  proficiency  was  broken 
down  into  a  set  of  more  detailed  performance  components.  The 
detailed  components  were  selected  from  the  total  set  of 
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components  measured  in  Project  A.  Two  different  bases  were  used 
in  defining  the  detailed  components.  First,  we  examined  the 
Project  A  MOS-specific  rating  scales,  with  each  scale  defining  a 
separate  performance  component.  Second,  we  examined  the  tasks 
selected  for  hands-on  testing  and  sorted  them  into  the  components 
defined  by  the  rating  scales.  The  result  of  this  process  was  a 
set  of  17  dimensions  that  were  somewhat  more  general  than  the 
detailed  elements  used  in  the  Task-Based  job  descriptions,  but 
for  which  an  approximate  correspondence  could  be  established. 

The  advantage  of  these  more  general  composites  was  that  empirical 
data  on  performance  from  Project  A  were  available  at  this  level 
of  detail. 

The  general  model  underlying  our  "synthetic"  approach  to 
standard  setting  had  two  main  assumptions.  First,  it  was  assumed 
that  the  job  performance  components  could  be  described  in  such  a 
way  that  their  relationship  to  the  predictor  measures  (ability 
levels)  was  constant  across  jobs.  This  means  that  data  from 
Project  A  could  be  used  to  link  performance  (in  terms  of  specific 
behavioral  incident  effectiveness  levels  or  specific  task  percent 
GO  levels)  to  predictor  score  levels  for  all  MOS  for  which  the 
dimension  was  relevant,  not  just  for  the  Project  A  MOS.  Note, 
however,  that  we  did  not  assume  that  a  given  performance  level 
had  the  same  degree  of  acceptability  for  all  jobs,  only  that  it 
related  to  the  same  ability  levels  for  all  jobs. 

The  second  assumption  used  in  our  "synthetic"  approach  was 
that  the  different  detailed  performance  dimensions  were 
"compensatory",  so  that  the  acceptability  of  overall  performance 
could  be  determined  by  averaging  the  acceptability  of  performance 
on  each  of  the  detailed  dimensions .  This  assumption  was  strongly 
supported  by  data  collected  during  Phase  I  and  Phase  II  in  which 
judges  were  told  the  acceptability  of  performance  on  individual 
dimensions  and  asked  to  rate  overall  acceptability.  The  results 
indicated  that  a  compensatory  model  accounted  for  virtually  all 
of  the  variance  in  the  outcome  ratings  (Szenas  &  Wise,  1989). 

Given  these  assumptions,  the  steps  that  we  used  to  estimate 
the  parameters  of  the  relational  model  were  as  follows: 

Cut  Scores 


We  determined  acceptability  cut  scores  for  each  detailed 
performance  dimension  as  described  in  Chapter  5,  using  both  a 
"percent  GO"  metric  (for  the  Task-Based  method)  and  an 
effectiveness  level  metric  (for  the  Behavioral  Incident  method). 
We  determined  the  percentage  of  the  relevant  Project  A  sample(s) 
who  scored  below  (or  above)  each  cut  score  putting  the  results 
from  the  two  approaches  onto  a  common  metric.  This  resulted  in 
three  cut  scores,  dividing  the  four  acceptability  levels  (from 
Unacceptable  to  Outstanding)  for  each  combination  of  method  and 
dimension.  We  then  averaged  these  percents  across  the  different 
performance  dimensions  and  methods  used  with  a  given  MOS. 
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Performance  Score  Metric 


We  created  a  "generic"  performance  scale  converting  from  the 
percentile  metric  derived  in  the  previous  step  to  a  metric  on 
which  performance  scores  would  be  normally  distributed.  We  used 
the  inverse  normal  distribution  function  to  convert  cut  scores  in 
a  percentile  metric  (percent  of  soldiers  below  a  given  cut  score) 
to  "z-scores"  for  which  the  distribution  of  the  Project  A  sample 
members  would  have  a  mean  of  zero  and  a  standard  deviation  of 
one.  At  this  point,  the  scale  was  arbitrary.  The  important 
considerations  were  that  performance  distributions  would  be 
approximately  normal  and  that  the  cut  scores  were  positioned 
correctly  relative  to  the  distribution  of  performance  in  the 
Project  A  samples.  It  might  have  been  meaningful  to  convert  back 
to  either  the  percent  GO  or  the  effectiveness  level  performance 
metrics,  but  performance  distributions  on  these  metrics  were 
sometimes  skewed.  Consequently,  a  bivariate  normal  model  for 
relating  performance  scores  to  predictor  scores  might  not  fit 
well.  In  any  event,  final  results  would  be  displayed  in  terms  of 
acceptability  levels  so  that  the  metric  for  the  underlying 
performance  dimension  did  not  matter. 

Regression  Parameter  Estimates 

We  estimated  the  regression  line  slope  to  bet  b  -  R  *  Sy  / 
Sx,  where  R  was  the  overall  validity  estimate  using  overall  core 
technical  proficiency  as  the  criterion;  Sy,  the  criterion 
standard  deviation,  was  unity  for  the  Project  A  sample;  and  Sx, 
the  predictor  standard  deviation,  was  estimated  from  the  same 
Project  A  sample.  We  estimated  the  regression  line  intercept  to 
be :  c-My-b*Mx,  where  My,  the  criterion  mean,  was  zero  by 

assumption,  b  was  the  regression  slope  estimated  in  the  previous 
step,  and  Mx,  the  predictor  mean,  was  estimated  from  the  Project 
A  sample.  Finally,  we  estimated  the  prediction  error  variance  to 
be:  Ve  -  Vy  *  (1  -  R**2),  where  Vy,  the  criterion  variance,  was 
again  unity  by  assumption  and  R  was  again  the  estimated  validity. 

Determination  of  Maximal  Failure  Rates 


Exact  determination  of  acceptable  failure  rates  necessarily 
involves  a  detailed  cost-benefit  analysis,  trading  off  the  cost 
of  higher  levels  of  selectivity  against  the  benefit  of  reduced 
failure  rates.  Such  analyses  are  the  primary  focus  of  a  major 
new  project  being  supported  by  the  Office  of  the  Assistant 
Secretary  of  Defense  (Force  Management  and  Personnel).  In  the 
present  project,  we  used  the  average  percent  unacceptable  and 
percent  less  than  fully  acceptable  from  the  standard  setting 
exercises  to  define  maximum  failure  rates. 

Demonstration  Software 


We  developed  a  computer  program  to  demonstrate  the  linkage 
model  that  was  just  described.  The  program  uses  a  database  with 
the  linkage  relationships  estimated  for  the  MOS  included  in  this 
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project.  This  database  includes  performance  cut  scores  for  each 
MOS  and  also  regression  slope,  intercept,  and  error  variance 
parameters.  The  user  may  vary  additional  parameters,  including 
the  selection  test  cut  score,  the  validity  of  the  predictor 
composite,  and  the  distribution  of  predictor  scores  among 
applicants.  The  program  then  displays  the  percentage  of  recruits 
expected  to  perform  at  each  level  of  acceptability.  In  an 
alternate  mode,  the  user  may  specify  an  expectancy  requirement 
(e.g.,  maximum  percent  Unacceptable,  minimum  percent  Outstanding) 
and  the  program  will  determine  the  minimum  selection  test  score 
that  will  meet  the  requirement.  In  either  case,  the  program  also 
displays  expected  selection  rates  and  the  number  of  applicants 
that  must  be  tested  in  order  to  achieve  desired  numbers  of  new 
recruits . 

Figure  6.2  shows  the  display  screen  from  the  demonstration 
linkage  program.  The  screen  heading  credits  the  Synthetic 
Validation  Project  and  identifies  the  MOS  for  which  linkage 
parameters  are  displayed.  The  upper  "box"  shows  the  expected 
number  and  percent  of  recruits  at  performance  acceptability 
level.  The  total  number  of  recruits  can  be  set  to  specific 
annual  goals  or  left  to  a  default  value  of  1,000.  The  lower  box 
shows  the  assumptions  used  in  developing  the  expected  performance 
level  estimates.  Some  of  the  assumptions  are  derived  from  other 
assumptions.  The  percent  qualifying  is  derived  from  the  minimum 
qualifying  score  and  the  applicant  AA  score  mean  and  standard 
deviation.  The  number  of  applicants  needed  is,  in  turn,  derived 
from  the  required  number  of  accessions  and  the  percent 
qualifying. 

Figure  6.3  shows  the  screen  when  "Change  Assumptions/ 
Constraints"  is  selected.  The  menu  at  the  bottom  of  this  screen 
lists  the  changes  that  may  be  made  by  the  user.  Option  3, 
performance  requirements,  requires  further  explanation.  It 
allows  the  user  to  specify  a  minimum  percent  acceptable  or 
outstanding  or  a  maximum  percent  unacceptable.  The  program  then 
estimates  the  smallest  "minimum  qualifying  score"  for  which  the 
performance  requirement  will  be  met.  If  other  parameters,  such 
aa  estimated  validity,  are  changed,  the  program  continues  to  work 
backwards  to  compute  minimum  qualifying  scores.  Reverse  mode  is 
terminated  if  a  new  minimum  qualifying  score  is  specified. 

The  demonstration  program  is  not  intended  to  be  used  at  this 
time  for  setting  qualifying  scores.  The  assumptions  involved  in 
computing  the  estimated  performance  levels  are  numerous  and 
complex.  Further  development  and  validation  would  be  required 
before  this  program  is  used  in  a  "production"  mode.  The  primary 
value  of  the  program  is  that  it  illustrates  the  data  and 
assumptions  that  are  required  in  creating  a  linkage  and  allows 
users  to  explore  the  interplay  between  various  factors  involved 
in  the  linkage  such  as  validity,  performance  cut  scores,  and 
selection  ratios. 
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Simulated  Linkage  of  Aptitude  Area  Composites  to  Expected  Performance  Levels 
Developed  for  the  Army  Synthetic  Validation  Project 


MOS:  118  -  Infantryman 


Expected  Distribution  of  Job  Performance 

Unacceptable 

Marginal 

Acceptable 

Outstanding 

Number 

Pet. 

Number  Pet. 

Number  Pet. 

Number  Pet.: 

20 

2.0 

275  27.5 

569  56.9 

136  13.6 

Current  Assumptions 

Aptitude  Area  Composite  Used:  CO  Estimated  Validity:  50 
Applicant  Aptitude  Area  Score  Mean:  100  Standard  Deviation:  20.0 

Minimum  Qualifying  Score:  90  Percent  Qualifying:  69.1 

Required  Number  of  Accessions:  1000  Applicants  needed:  1446 

Performance  Requirements:  None 


Next  Action: 

1.  New  MOS  3.  Print  Current  Screen 

2.  Change  Assumptions/Constraints  4.  Exit  Program 

Your  choice: 

Figure  6.2.  Initial  screen  from  Linkage  demonstration  software. 


Simulated  Linkage  of  Aptitude  Area  Composites  to  Expected  Performance  Levels 
Developed  for  the  Army  Synthetic  Validation  Project 

MOS:  118  -  Infantryman 


Expected  Distribution  of  Job  Performance 
Unacceptable  Marginal  Acceptable  Outstanding 

Number  Pet.  Number  Pet.  Number  Pet.  Number  Pet. : 

20  2.0  275  27.5  569  56.9  136  13.6 


Current  Assumptions 

Aptitude  Area  Composite  Used: 

CO 

Estimated  Validity: 

50 

Applicant  Aptitude  Area  Score  Mean: 

100 

Standard  Deviation: 

20.0 

Minimum  Qualifying  Score: 

90 

Percent  Qualifying: 

69.1 

Required  Number  of  Accessions: 
Performance  Requirements:  None 

1000 

Applicants  needed: 

1446 

Next  Action: 


1. 

Minimum  AA  Score 

4. 

Number  of  Accessions  Required 

2. 

Selection  Test  Validity 

5. 

Applicant  Means  and  S.D.s 

3. 

Performance  Requirements 

6. 

No  changes 

Your  choice: 


Figure  6.3.  Change  assumptions /constraints  screen. 
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The  linkage  methodology  described  in  this  section,  while  it 
requires  a  large  number  of  assumptions,  appears  workable  given 
adequate  estimates  of:  (a)  percent  of  incumbents  performing  at 
each  acceptability  level,  (b)  predictor  score  distributions  for 
these  same  incumbents,  and  (c)  a  validity  estimate  for  the 
predictor  score.  The  linkage  methodology  provides  approximate 
failure  rate  estimates  for  different  combinations  of  validity, 
selectivity,  and  acceptability.  Exact  procedures  for  setting 
selection  cutoffs  would  require  evaluation  of  the  costs  and 
benefits  associated  with  different  failure  rates. 


Chapter  7  s  Summary  and  Discussion 

Norman  G.  Peterson  (PDRII),  Lauress,  L.  Wise  (AIR), 
and  John  P.  Campbell  (HumRRO) 


The  Synthetic  Validity  Project  developed  and  evaluated  a 
series  of  alternative  procedures  fori  (a)  analyzing  jobs  in  terms 
of  their  critical  components,  (b)  obtaining  expert  judgments  of 
the  validities  of  an  array  of  individual  attributes  for 
predicting  the  critical  components  of  performance,  (c) 
establishing  prediction  equations  for  specific  jobs  when 
criterion-related  validation  data  is  not  available,  (d) 
estimating  criterion  referenced  performance  standards  for 
specific  jobs,  and  (e)  specifying  scores  on  the  predictor  battery 
that  would  be  necessary  to  achieve  the  desired  performance 
standard,  given  the  bivariate  distribution  between  predictor 
scores  and  performance  scores.  The  work  of  the  project  was 
firmly  grounded  in  previous  research  and  theory  and  two  main 
literature  reviews  were  produced,  one  on  synthetic  validation 
(Crafts,  Szenas,  Chia,  &  Pulakos,  1988)  and  one  on  setting 
performance  standards  (Pulakos,  Wise,  Arabian,  Heon,  &  Delaplane, 
1989). 

The  project  began  with  three  alternative  procedures  for 
analyzing  jobs,  three  major  methods  (with  variations)  for 
generating  prediction  equations,  and  three  principal  scaling 
procedures  for  estimating  performance  standards.  The  evaluations 
of  the  competing  methods  proceeded  iteratively  through  three 
major  phases.  After  Phases  I  and  II,  some  methods  were  dropped 
from  consideration,  others  were  revised,  and  additional 
parameters  were  designed  into  the  evaluation.  In  general, 
procedures  were  evaluated  in  terms  of  their  reliabilities, 
distributional  properties,  and  discriminant  and  convergent 
validities.  For  the  prediction  equations,  the  absolute  level  of 
validity  generated,  the  discriminant  validity  across  jobs,  and 
the  correspondence  of  the  synthetic  or  transported  equations  to 
the  job  specific  empirical  results  were  of  special  interest.  For 
the  standard  setting  investigation  the  relative  stringency  of  the 
standards  across  methods  and  across  MOS  were  of  particular 
interest . 

Because  the  relevant  empirical  data  were  expert  judgments, 
relatively  large  samples  of  judges  were  used  in  each  phase  across 
a  larger  number  of  jobs  than  had  been  used  in  any  previous  study. 
This  permitted  a  number  of  parameters  of  both  jobs  and  judges  to 
be  investigated.  The  availability  of  the  Project  A  concurrent 
validation  data  provided  an  unprecedented  opportunity  to  compare 
alternative  synthetic  methods  to  actual  empirical  results  and  to 
anchor  performance  standards  in  known  performance  distributions. 

As  a  result,  the  Synthetic  Validity  Project  was  able  to 
investigate  a  number  of  issues  that  had  never  been  addressed 
before.  While  the  answers  are  no  means  definitive,  a  great  deal 
was  learned,  new  issues  were  identified,  and  certain 
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recommendations  can  be  made  about  future  use  of  the  methods. 
These  have  been  discussed  in  some  detail  in  the  preceding 
chapters,  but  the  major  findings  and  their  implications  are 
summarized  below. 

Job  Analytic  Methods  for  Synthetic  Validation 

As  a  consequence  of  the  results  obtained  in  Phases  I  and  II, 
the  attribute  model  and  the  job  behavior  method  were  set  aside 
and  the  Army  Task  Questionnaire  became  the  procedure  of  choice. 
While  all  methods  provided  very  reliable  descriptions,  the  task 
questionnaire  yielded  greater  discriminability  across  MOS  and 
seemed  to  have  higher  acceptability  among  the  judges . 

Although  we  settled  on  job  tasks  as  the  descriptive  unit, 
the  other  types  of  components  did  not  perform  poorly.  The  job 
task  method  was  the  best  of  a  set  of  good  options .  Several 
different  types  of  item  response  scales  for  job  component 
questionnaires  were  compared,  but  they  were  very  highly 
correlated.  However,  collecting  separate  task  importance  ratings 
of  the  Core  Technical,  General  Soldiering,  and  Overall 
Performance  aspects  of  the  job  does  appear  to  be  crucial  for  an 
effective  job  analysis.  Although  there  were  some  very  slight 
differences  between  subject  matter  experts'  judgments, 
differences  in  the  supervisory  level  of  SMEs  and  their 
organizational  point  of  view  (a  training  orientation  vs.  an 
operational  unit  orientation)  appear  to  make  virtually  no 
difference  in  the  usefulness  of  the  job  analysis  data  for  the 
purpose  of  forming  synthetic  prediction  equations. 

Synthetic  Equations 

Judgments  about  the  validity  of  human  attributes  for 
predicting  job  descriptor  elements  proved  to  be  particularly 
robust  across  judges  who  differed  across  a  fairly  wide  range  of 
relevant  psychological  training  and  experience.  In  terms  of  the 
validity  of  equations  developed  from  those  judgments,  it  appears 
that  as  long  as  the  judges  have  some  graduate  level  psychological 
training  and  a  minimum  amount  of  relevant  research  experience, 
then  there  will  be  no  practically  significant  differences  in  the 
validity  of  the  synthetic  equations  developed  from  the  judgments. 
With  regard  to  forming  synthetic  equations,  weighting  methods 
that  set  judgments  of  predictor  validities  with  low  values  to 
zero  appear  to  improve  the  discriminant  validity  of  the  resulting 
equations,  with  perhaps  a  slight  cost  in  absolute  validity. 

The  synthetic  validation  methods  produced  equations  that 
have  only  slightly  lower  absolute  validities  than  least  squares 
equations  developed  directly  on  the  jobs  themselves  depending  on 
the  criterion  and  method  of  forming  the  synthetic  equation.  It 
appears  that  the  synthetic  equation  achieves  levels  of  absolute 
validity  that  are  about  96%  of  those  achieved  with  least  squares 
equations  developed  on  the  MOS,  and  levels  of  discriminant 
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validity  that  are  about  33%  of  those  achieved  by  the  least 
squares  equations. 

More  importantly  perhaps,  the  synthetic  equations  produce 
results  very  similar  to  more  traditional  validity 
transportability  methods  when  both  methods  are  applied  to  the 
problem  of  identifying  an  appropriate  prediction  equation  for  a 
job  for  which  no  empirical  validation  can  be  undertaken.  In  this 
case,  the  best  level  of  absolute  validity/discriminant  validity 
obtained  by  synthetic  methods  is  .65/. 02,  whereas  the  best  method 
for  transportability  (a  "MOS-match"  method)  produces  values  of 
.67/. 03.  However,  the  MOS-match  method  may  net  always  produce  an 
acceptable  equation — that  is,  a  new  MOS  may  not  correlate  highly 
enough  (in  terms  of  Army  Task  Questionnaire  profile  correlations) 
with  an  existing  MOS  to  warrant  confidence  in  the  use  of  an 
associated  equation.  The  synthetic  method  will  always  produce  an 
acceptable  equation  in  the  sense  that  it  uses  all  the  information 
supplied  by  the  SMEs  to  form  the  equation.  Both  methods  require 
the  collection  of  Army  Task  Questionnaire  data  from  SMEs  for  the 
MOS  for  which  an  equation  is  to  be  developed.  Psychometrically, 
10-15  SMEs  would  be  sufficient,  however,  politically  a  larger 
number  should  be  used  to  insure  representation  of  all  important 
constituencies . 

What  should  the  Army  do  when  it  must  select  or  develop  a 
prediction  equation  for  an  MOS  for  which  empirical  data  are  not 
available,  because  the  MOS  is  new,  or  because  empirical 
validation  research  cannot  be  carried  out  for  an  existing  MOS? 
Based  on  the  research  described  here,  we  can  say  that  there  are 
several  good  options  available  but  no  clear-cut  choice  between 
them.  The  synthetic  method,  the  "MOS-match"  method,  the 
"cluster-match"  method,  and  the  "general"  method  all  produced 
absolute  validities  over  .60  but  no  discriminant  validity  greater 
than  .02.  An  excellent  opportunity  to  collect  additional 
information  about  the  robustness  of  these  methods  will  present 
itself  in  the  ongoing  Career  Force  project.  The  prediction 
equations  developed  in  the  current  project  can  all  be  applied  to 
the  data  developed  on  that  project.  These  analyses  will  provide 
information  about  the  generalizability  of  the  various  equations 
across  different  samples  of  soldiers  and  across  a  different 
validation  design  (concurrent  vs.  longitudinal).  The  synthetic 
equations  tested  here  used  no  sample-based  data  in  their 
development,  whereas  all  the  other  methods  used  least  squares 
optimization.  It  may  be  that  the  synthetic  equations  will 
maintain  their  absolute  and  discriminant  validity  levels  better 
than  the  other  methods. 

What  do  these  results  mean  for  personnel  research  in 
general?  First,  the  statement  made  by  Mossholder  and  Arvey 
(1984)  that  no  large-scale  synthetic  validation  research  had  been 
completed  is  no  longer  accurate  (actually,  another  large-scale 
synthetic  validation  research  project  [Peterson,  Rosse,  & 

Houston,  1982]  was  completed  some  years  ago,  but  is  not  widely 
known).  Secondly,  it  appears  that  the  use  of  appropriately 
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qualified  Judges  to  form  prediction  equations  via  synthetic 
models  will  lead  to  prediction  equations  that  are  nearly  as  valid 
as  more  traditional  transported  or  generalized  equations. 

Thirdly,  for  the  data  analyzed  in  this  project,  synthetically 
developed  equations  appeared  to  be  nearly  as  valid  as  least 
squares  equations  developed  via  a  full-blown,  criterion-related 
validation  study  that  uses  work  sample  tests,  job  knowledge 
tests,  and  supervisory  ratings  as  criteria  and  employs  a 
sufficient  sample  size  (N  >  250).  These  results  seem  to  support 
and  extend  work  by  Schmidt  and  colleagues  (e.g.,  Schmidt,  Hunter, 
Croll,  &  McKenzie,  1983)  concerning  the  usefulness  of  expert 
judgments  in  validation  research.  Fourthly,  we  did  not  find 
large  amounts  of  discriminant,  or  differential,  validity  across 
jobs.  However,  the  relative  pattern  of  discriminant  validities 
was  as  it  should  be  for  the  different  criterion  measures.  That 
is,  discriminant  validity  was  greatest  when  estimated  against  the 
Core  Technical  Proficiency  criterion  and  less  when  General 
Soldiering  or  Overall  Performance  was  used.  If  differential 
validity  is  to  be  observed,  it  must  come  from  genuine  differences 
in  the  core  task  content  which  in  turn  have  different  ability  and 
skill  requirements. 

In  retrospect,  it  may  be  reasonable  to  expect  only  a 
moderate  amount  of  differential  validity.  First-tour  MOS  in  the 
Army  come  from  only  one  sub-population  of  the  occupational 
hierarchy,  entry-level  skilled  positions,  and  do  not  encompass 
any  supervisory,  managerial,  advanced  technical,  or  formal 
communication  (i.e.,  writing  and  speaking)  components.  Further, 
the  two  year  first-tour  incumbent  is  still  at  a  relatively  early 
stage  on  the  way  to  expert  performance.  Among  others,  Ackerman 
(1988)  argues  that  general  abilities  will  be  important  for  a  wide 
range  of  tasks  at  these  early  stages  of  mastery. 

Another  issue  that  needs  additional  investigation  is 
identified  by  the  lack  of  correspondence  between  matching  jobs  on 
the  basis  of  their  task  requirements  and  matching  them  on  the 
basis  of  their  prediction  equations.  Recall  from  Chapter  4  that 
the  empirically  based  prediction  equation  transported  from  the 
MOS  with  the  highest  "task  match"  did  not  always  yield  validities 
that  were  higher,  or  as  high,  as  prediction  equations  from  other 
MOS.  Why  is  that?  Whatever  the  reason,  it  is  most  likely  also 
the  cause  of  observing  basically  the  same  average  validities  for 
the  general  equations  as  for  the  average  of  the  MOS  or  cluster 
specific  equations.  If  differential  task  content  was  a  perfect 
mirror  of  differential  job  requirements  then  by  definition  the 
individual  MOS  or  MOS  cluster  equations  would  capture  mo~a 
Information  than  a  single  general  equation.  The  fact  that  such 
is  not  the  case  implies  that  the  job  analytic  methods  so  far 
developed  cannot  capture  everything  that  the  empirical  weights 
can.  They  most  likely  will  never  be  able  to  do  so,  because  few 
problems  have  perfect  solutions.  However,  investigating  the 
reasons  for  the  lack  of  correspondence  in  matching  task  profiles 
versus  matching  prediction  equations  would  offer  additional  clues 
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about  how  job  analytic  methods  might  be  made  even  more  sensitive 
to  differential  job  requirements. 

In  summary,  for  this  sub-population  of  jobs,  the  synthetic 
methods  are  reasonable  ways  to  generate  prediction  procedures  in 
situations  where  no  empirical  validation  data  are  available. 
Absolute  and  discriminant  prediction  accuracy  will  suffer  just  a 
bit  because  the  synthetic  methods  tend  to  weight  the  array  of 
predictors  more  similarly  across  jobs  than  do  the  empirical 
estimation  procedures.  The  similarity  between  the  synthetic 
methods  and  validity  generalization  bears  further  investigation 
as  to  why  they  seem  to  experience  the  same  slight  decrement  when 
compared  to  job  specific  empirical  validation.  It  may  indeed  be 
for  different  reasons. 

Finally,  it  seems  clear  from  these  results  that  personnel 
psychology  has  learned  a  great  deal  about  the  nature  of  jobs  and 
the  individual  differences  that  forecast  future  performance  on 
jobs.  For  many  subpopulations  of  the  occupational  hierarchy, 
such  as  the  one  considered  in  this  project,  expert  judges  can 
take  advantage  of  good  job  analysis  information  almost  as  well  as 
empirical  regression  techniques.  While  this  may  not  be  true  for 
other  parts  of  the  occupational  spectrum,  we  are  indeed  learning. 

Setting  Performance  Standards 

Within  the  Synthetic  Validity  Project,  the  investigation  of 
procedures  for  setting  performance  standards  confronts  directly 
what  may  be  one  of  the  most  difficult  measurement  problems  of 
all.  That  is,  the  principal  objective  was  to  evaluate 
alternative  methods  for  scaling  individual  performance  against 
defined  standards.  Standards  are  represented  as  specific  levels 
of  performance  that  the  organization  defines  to  be  critical  in 
terms  of  specific  operational  outcomes.  For  the  current  project, 
the  critical  levels  were  defined  in  terms  of  performance  that  was 
judged  as  Unacceptable,  Marginal,  Acceptable,  or  Outstanding. 

The  operational  meaning  of  each  performance  level  was  discussed 
in  Chapter  5.  The  principal  reason  for  scaling  levels  of 
performance  in  such  operational  and  measurement-independent  terms 
is  to  provide  a  means  for  setting  selection  standards  (e.g., 
critical  scores  on  the  AFQT) . 

To  the  best  of  our  knowledge,  the  problem  of  scaling 
performance  standards  has  never  before  been  addressed  in  a  non- 
educational  setting  and  has  never  been  examined  in  the  way  it  was 
in  this  project  in  any  job  situation.  In  this  regard,  the 
Synthetic  Validity  Project  was  in  completely  uncharted  territory 
and  faced  a  large  number  of  new  and  complex  issues. 

Almost  all  previous  work  related  to  performance  standards 
has  been  in  the  educational  setting  and  goes  under  the  label  of 
criterion-referenced  measurement.  In  this  context,  the  critical 
scores  are  referenced  to  a  specific  criterion  measure  (e.g., 
certification  exams  or  achievement  tests).  The  current  project's 
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literature  review  on  standard  setting  (Pulakos  et  al.,  1989) 
summarized  the  available  methods  and  their  research  results. 


Professional  licensing  boards  impose  standards  of  a  sort 
when  they  certify  individuals  for  membership  in  a  profession. 
However,  in  this  instance  it  is  really  the  predictor  that  is 
being  scaled  and  not  professional  performance.  That  is, 

"passing"  the  licensing  or  certification  exam  is  taken  as  a 
forecast  of  successful  professional  performance.  No  research 
study  has  ever  attempted  to  establish  the  predictive  validity  of 
licensing  or  certification  standards.  Finally,  for  a  few  jobs  in 
the  labor  force,  industrial  engineering  or  human  factors 
procedures  have  been  used  to  established  standard  times  for 
specific  job  tasks.  Such  time  standards  typically  do  not 
consider  directly  the  level  of  performance  required.  Failure  to 
meet  the  time  standards  are  seen  as  problems  in  motivation, 
training,  or  job  design. 

In  summary,  the  examination  of  standards  in  education, 
professional  licensing,  or  industrial  engineering  does  not  deal 
with  the  same  questions  as  the  current  project.  The  Synthetic 
Validity  Project  addressed  a  much  more  difficult  set  of  issues. 
That  is,  can  the  available  job  performance  information  be  used  to 
scale  critical  levels  of  job  performance  itself,  such  that  there 
would  be  an  organizational  consensus  as  to  what  constitutes 
Unacceptable,  Marginal,  Acceptable,  and  Outstanding  performance? 
We  realized  from  the  start  this  was  a  very  difficult  problem  that 
incorporates  both  complex  scaling  issues  and  very  sensitive  value 
judgments.  We  were  not  disappointed. 

In  the  present  case,  we  investigated  the  feasibility  of 
setting  performance  standards  for  new  and  existing  jobs  without 
going  to  the  time  and  expense  of  developing  and  administering 
specific  tests.  While  we  tried  to  link  dirterent  performance 
levels  to  specific  consequences  (e.g.,  dismissal,  remedial 
training),  judges  were  aware  that  the  overall  goal  of  these 
exercises  was  to  set  selection  test  cutoffs  and  not  to  carry  out 
the  specific  consequences  described. 

Much  was  learned  from  this  research  about  the  feasibility  of 
specific  procedures  for  scaling  standards  of  performance.  For 
example,  it  seems  necessary  that  the  SMEs  fully  understand  the 
objectives  and  the  consequences  of  the  standard  setting  exercise. 
It  seems  likely  that  the  frame  of  reference  for  the  judgments 
will  influence  the  level  of  performance  designated  as  the 
standard.  Consequently,  there  are  complex  value  judgments  to  be 
resolved  up  and  down  the  organizational  hierarchy.  The  consensus 
achieved  among  the  SMEs  in  the  current  project  is  a  major  step  in 
that  direction,  but  the  consequences  of  the  procedure  must  be 
open  from  the  start.  Not  specifying  them  completely  would  only 
lead  to  complications  later. 

As  the  methods  were  revised  and  improved,  the  scale  values 
themselves  proved  to  be  quite  reliable  across  judges  and  the 
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small  mean  differences  across  type  of  judges  (e.g. ,  NCO  vs. 
Officers)  could  readily  be  interpreted  in  terms  of  their 
respective  frames  of  reference.  The  dominant  result  was  of  high 
consistency  across  judges.  A  potential  difficulty  to  guard 
against  is  the  frame  of  reference  produced  by  the  wide 
experiences  of  Army  personnel  with  the  Skill  Qualification  Test 
(SQT).  The  stimulus  material  presented  for  any  standard  setting 
procedure  must  be  clearly  and  distinguished  from  the  context  of 
the  SQT . 

The  most  significant  conclusion  of  the  «*•  ndard  setting 
research  was  that  the  different  methods  that  «e  developed  and 
evaluated  led  to  different  results.  Very  strict  standards  were 
set  when  performance  was  described  in  terms  of  "Percent  Go" 
scores  on  hands-on  task  performance  tests.  As  many  as  half  of 
current  incumbents  were  less  than  fully  acceptable  and  nearly  30% 
were  unacceptable  according  to  the  standards  set  by  the  Task- 
Based  method.  By  contrast,  fewer  than  40%  of  current  incumbents 
were  less  than  fully  acceptable  and  only  *%  were  unacceptable 
according  to  the  standards  set  by  the  Behavioral  Incident  method. 
Direct  estimates  of  the  percent  of  incumbents  at  each  performance 
level,  collected  in  Phase  II,  fell  between  these  two  extremes. 

It  is  likely  that  much  of  the  difference  between  the  two 
standard  setting  approaches  was  due  to  properties  of  the 
empirical  data  used  to  estimate  the  distribution  of  incumbent 
performance  levels  rather  than  to  the  methods  themselves.  The 
standard  setting  judges  may  not  have  accounted  for  a  tendency 
toward  leniency  in  the  ratings  provided  by  supervisors  and  peers 
in  the  Project  A  data.  The  judges  also  may  not  have  appreciated 
the  strictness  with  which  the  hanas-on  exercises  were  scored  nor 
differences  from  the  SQT  where  soldiers  are  afforded  an 
opportunity  to  practice  before  being  tested.  Further  research  is 
needed  to  find  out  more  about  the  cognitive  processes  of  the 
judges  as  they  were  setting  standards  by  each  method  and  to 
compare  their  assumptions  to  the  conditions  under  which  the 
empirical  data  were  collected.  For  this  reason,  we  are  not 
prepared  to  reject  either  method  but  recommend  further  work 
before  any  method  is  put  to  operational  use. 

Another  significant  finding  was  the  relative  lack  of  MOS 
differences  in  the  resulting  performance  distributions.  The  Army 
may  very  well  be  successful  in  designing  selection  and  training 
systems  and  job  requirements  to  roughly  equate  the  proportions  of 
Unacceptable,  Marginal,  Acceptable,  and  Outstanding  performers  in 
each  MOS.  Further  research  is  needed  to  explore  the  extent  to 
which  this  is  the  case  or  whether  the  standard  setting  procedures 
are  simply  insensitive  to  differences  in  performance 
requirements.  For  example,  with  a  larger  set  of  stimulus 
material,  it  might  be  possible  to  use  more  and  less  difficult 
tasks  (or  behavioral  dimensions)  and  examine  the  extent  to  which 
the  judges  adjust  their  standards  accordingly.  Further  research 
on  the  generalizability  of  each  method  across  MOS  through 
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"substitution*  of  appropriate  tasks  or  behaviors  is  also 
essential . 


More  research  also  is  needed  to  examine  judges'  hypotheses 
about  the  consequences  of  their  judgments  and  the  effect  of  these 
hypotheses  on  the  standards  that  they  set.  If  judges  were 
setting  standards  that  might  actually  lead  to  soldiers  being 
discharged,  would  they  set  them  differently?  Some  might  argue 
that  anyone  not  actually  discharged  must  be  performing  acceptably 
and  that  selection  test  standards  should  be  linked  to  actual 
rates  of  remedial  training  and  discharges  rather  than  to 
"hypothetical"  performance  standards  not  explicitly  linked  to 
operational  decisions. 

The  Task-Based  method  is  limited  by  the  set  of  tasks  for 
which  empirical  data  are  available.  In  further  refinement  of 
this  method,  development  and  administration  of  additional  har 
on  task  tests  might  be  required.  Further  research  might  exp 
the  feasibility  of  providing  "scorer  training"  to  the  judges 
before  they  make  MOS-specific  substitutions  and  before  they  nu. 
judgments.  Summative  methods  that  examine  each  item  in  a  test  in 
setting  an  overall  passing  score  could  also  be  explored  as  a 
means  of  achieving  a  closer  linkage  between  the  empirical  data 
and  the  standard  setting  judgments. 

The  Behavioral  Incident  method  is  also  limited  by  the 
existing  pool  of  behavioral  incidents  and  expansion  of  thi3  pool 
might  be  required  before  the  approach  is  used  operationally. 
Incidents  are  needed  for  additional  dimensions  and  also  in  the 
average  effectiveness  range.  Alternative  data  collection  formats 
might  be  explored,  such  as  having  judges  rate  the  probability  of 
observing  each  incident  for  specific  soldiers  that  they  judge  to 
be  at  specific  performance  levels.  This  also  might  increase  the 
correspondence  between  the  standard  setting  judgments  and  the 
empirical  data. 

The  approach  to  linkage  appeared  useful  as  far  as  it  went. 
Before  being  used  operationally,  further  checks  on  assumptions 
such  as  linearity  and  homogeneity  of  variance  should  be  made. 

Most  importantly,  the  linkago  between  test  scores  and  expected 
performance  level  Is  tenable,  but  cost-benefit  analyses  are 
required  to  determine  the  expectancy  levels  that  provide  optimal 
tradeoffs  between  recruiting  costs  and  the  costs  and  benefits  of 
resulting  performance  levels. 
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Appendix  A 

Examples  of  Phase  III  Instruments 


INTRODUCTION  TO  JOB  DESCRIPTION  WORKSHOPS 

There  are  two  long-range  goals  for  this  project:  (a)  to  develop 
techniques  that  can  be  used  to  identify  the  specific  skills  and  abilities 
required  to  perform  successfully  in  each  entry-level  MOS  in  the  Army;  and 
(b)  to  develop  procedures  for  determining  the  minimum  ability  requirements  for 
each  entry-level  MOS.  These  goals  must  be  reached  in  order  for  the  Army  to 
take  advantage  of  recent  advances  in  personnel  selection  research  conducted  by 
the  Army. 

The  research  that  led  to  these  advances  began  six  years  ago,  when  thee 
Army  Research  Institute  began  an  investigation  of  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB),  which  all  of  the  Armed  Services  use  to 
select  new  recruits  and  to  classify  recruits  into  MOS.  The  Army  was 
interested  in  determining  how  well  the  ASVAB  scores  that  an  applicant  obtains 
when  he  or  she  enlists  predict  on-the-job  performance  in  f*»e  Army  in  the 
soldier's  first  tour.  The  Army  also  was  interested  in  determining  whether  new 
tests,  such  as  temperament  tests  or  psychomotor  tests,  could  be  added  to  the 
ASVAB  so  that  it  would  predict  job  performance  even  more  accurately. 

The  Army  selected  19  MOS  for  detailed  investigation.  For  each  of  these 
MOS,  a  number  of  job  performance  tests  were  developed.  These  included  hands- 
on  tests,  paper-and-pencil  job  knowledge  tests,  and  performance  rating  scales. 
Approximately  500  first-tour  soldiers  from  each  MOS  completed  these 
performance  tests.  At  the  conclusion  of  testing,  scores  on  the  performance 
tests  were  compared  with  soldiers'  ASVAB  scores  to  determine  how  well  the 
ASVAB  predicted  job  performance. 

Results  showed  that  the  ASVAB  did  an  excellent  job  of  predicting  how 
well  soldiers  could  perform  the  tasks  they  had  been  assigned  on  their  jobs. 

The  results  also  revealed  that  the  new  temperament  tests  predicted  job 
motivation  and  personal  discipline  better  than  the  ASVAB  did. 

One  finding  from  this  investigation  was  that  the  Army  could  improve  the 
selection  and  classification  of  new  recruits  into  some  MOS  by  making  changes 
in  the  ASVAB  aptitude  area  composites.  Those  changes  have  now  been  made. 

More  changes  may  be  made  in  the  days  ahead,  especially  If  the  Army  decides  to 
add  new  tests  to  the  ASVAB. 

The  primary  limitation  of  this  investigation  was  that  the  Army  was  able 
to  study  only  19  MOS  In  detail.  As  the  Army  prepares  to  make  additional 
changes  In  the  ASVAB  and  in  the  way  ASVAB  is  used  to  select  and  classify 
recruits  into  MOS,  It  must  develop  techniques  that  can  be  used  to  identify  the 
specific  skills  and  abilities  required  to  perform  successfully  in  all  269 
entry-level  MOS  in  the  Army.  The  Army  also  must  develop  procedures  for 
determining  the  minimum  ability  requirements  for  each  MOS. 
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We  have  prepared  several  different  rating  and  judgment  procedures  that 
we  think  NCOs  and  officers  will  be  able  to  use  to  help  us  identify  the  skill 
and  ability  requirements  of  first-tour  soldiers  in  their  MOS.  What  we  want 
you  to  do  today  is  try  out  some  of  the  procedures  we  have  developed  and  let  us 
know  how  easy  or  difficult  you  find  them.  We  want  you  to  tell  us  when 
instructions  are  unclear.  In  short,  we  want  you  help  in  refining  the 
procedures,  so  that  we  can  obtain  ratings  and  judgments  that  are  as  accurate 
as  possible. 


DATA  REQUIRED  BY  THE  PRIVACY  ACT  OF  1974 

IS  UJ.C.  SSSat 


1.  AUTHORITY 

10  use  Sec  4503 


I  2.  PRINCIPAL  PURPOSEIS) 


PRESCRISING  DIRECTIVE 

AR  70-1 


The  data  collected  are  to  be  used  for  research  purposes  only. 


13.  AOUTINC  USCS 


This  is  an  experimental  personnel  data  collection  activity 
conducted  by  the  U.  S.  Amy  Research  Institute  for  the  Behavioral 
and  Social  Sciences  persuant  to  its  research  mission  as  prescribed 
in  AR  70-1.  When  identifiers  (name  or  Social  Security  Number)  are 
requested  they  are  to  be  used  for  administrative  and  statistical 
control  purposes  only.  Full  confidentiality  of  the  responses  will 
be  maintained  in  the  processing  of  these  data. 


4.  MANDATORY  on  VOLUNTARY  OlSCLOSUAt  and  effect  on  individual  not  PROVIDING  INFORMATION 

Although  your  participation  in  this  research  is  voluntary,  we 
encourage  you  to  provide  complete  and  accurate  information  in  the 
Interests  of  the  research.  There  will  be  no  effect  on  you  for  not 
providing  all  or  any  part  of  the  information.  This  notice  may  be 
detached  from  the  rest  of  the  form  and  retained  by  you  if  so  desired. 


FORM 


OA  Form  4368-fl,  1  May  7 


Privacy  Act  >tat*mant  •  26  Sap  75 


BACKGROUND  INFORMATION 


1.  Nam: 


uaat 


first 


MZ 


2.  SSN: 


3 .  Data : 


Day 


Month 


Tear 


4.  Post: 


5.  Unit: 


6.  Your  Position  or  Job  Title: 


7.  Sex: 


j— |  Male 

8.  Race:  j— | 

j~~j  female 

c 

□ 

□ 

□ 

(Include  your  MOS  code  if  you  are  a  soldier.) 


Black/ Afro-American 

American  Indian 

Hispanic 

Nhite 

Other 


9.  Please  enter  your  current  pay  grade  (for  example  E6,  m2,  02,  or  GS-9) 


10.  Tim  in  the  Army  (including  time  in  service  and,  for  civilians,  time  working 

for  the  Army  as  a  civilian) :  _____  _ 

years  months 

11.  MOS  you  are  rating  (circle  one): 

12B  13B  27E  29E  31C  310  31F  51B  S4B  SSB  9SB  9(B 

12.  Experience  with  MOS  you  are  rating:  _  _____ 

years  months 

Experience  includes  tim  spent  working  in  or  supervising  persons  in  the  MOS, 
training  persons  for  the  MOS,  reviewing  and  revising  doctrine  or  training  and 
testing  programs  for  the  MOS. 
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performance  area  definitions 
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loon  m  aw  fcXAWLEC  Mow  and  read  through  their  wcptanaMons  before  starting  to  make  your  ratings 
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A- 11 


A-12 


A-13 


pressure,  change  sterile  dressings,  eta  );  does  not  include  first  aid. 
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etc.). 
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I 


Decod e  Data;  Use  coding  systems  and  rules  to  decipher  and  Inter¬ 
pret  coded  information  (lor  example,  use  CEOI,  Interpret 
symbois/signs,  etc.). 


I  3 

SS 

•  *.  »4  3 

mm  <h  u 

to  M  ^ 

m  jj  3 

4  «  M  O  #«4 


•  o  «o  o« 

>«a  h  > 

•  l  l  ll 


s 


e 

© 

i 

a 

J 


3 


3 
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Control  Monty.  Keep  accounting  records;  disperse  and  collect 
money  and  money  orders. 
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Standard  Setting  Exercises 
Performance  Level  Definitions 


We  have  designed  two  exercises  to  set  job  performance  standards.  In  each 
exercise,  we  would  like  you  to  help  us  set  standards  for  job  performance  that 
will  allow  us  to  determine  whether  a  soldier's  performance  is  Unacceptable, 
Marginal,  Acceptable,  or  Outstanding. 


Unacceptable:  Soldiers  who  consistently  perform  like  this  should  not 

have  been  selected  for  this  MOS.  Their  performance  is 
hurting  the  Army.  Additional  training  would  not  bring 
their  performance  up  to  acceptable  levels. 


Marginal:  Soldiers  who  consistently  perform  like  this  need  extra 

or  remedial  training.  Their  current  performance  is  of 
little  or  no  benefit  to  the  Army. 


Acceptable:  Soldiers  who  consistently  perform  like  this  are  doing 

an  adequate  job.  They  are  making  positive 
contributions  to  the  Army. 


Outstanding:  Soldiers  who  consistently  perform  like  this  are  doing 

extremely  well.  They  are  making  exceptional 
contributions  to  the  Army  and  are  good  examples  to 
other  soldiers. 


Keep  these  definitions  handy  as  you  complete  the  following  questionnaires. 
Please  refer  back  to  them  f i  om  time  to  time. 
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Hama: 


Behavioral  Incident  Standard  Setting  Questionnaire 
12B  •  COMBAT  ENGINEER 


In  this  section  of  the  workshop  we  would  like  you  to  help  us  set  |ob  performance 
standards  on  two  or  three  broad  performance  areas  that  apply  to  the  MOS  that  you  are  rating. 
For  each  area,  twenty  behavioral  incidents,  or  examples  ot  performance,  have  been  provided 
by  other  SMEs  as  samples  of  the  types  of  behaviors  that  fit  each  area.  These  examples 
come  from  a  number  of  different  MOS  and  they  vary  In  level  of  effectiveness.  Thus,  some 
Incidents  illustrate  poor  performance  and  some  illustrate  good  performance,  but  they  all 
illustrate  performance  within  a  particular  type  of  job  behavior. 


For  each  area,  read  the  definition  and  think  of  similar  types  of  tasks  that  are  performed  in 
the  MOS  that  you  are  rating.  Then  for  each  behavioral  incident  ask  yourself  the  following 
question: 

If  a  soldier  CONSISTENTLY  performed  duties  In  this  area  at  a  level  of 
effectiveness  like  the  example  incident,  what  kind  of  soldier  would  this  be? 

Refer  to  the  one-page  handout  containing  the  definitions  of  Unacceptable,  Marginal, 
Acceptable,  and  Outstanding  performance  to  guide  you  as  you  make  your  ratings.  Make  your 
ratings  by  thinking  of  similar  types  of  incidents  for  your  MOS.  Circle  the  letter  that  matches 
that  level  of  effectiveness  ot  Incident  If  any  Incident  is  so  unfamiliar  that  you  cannot  decide 
what  level  of  performance  effectiveness  it  represents,  than  circle  CNR  for  ‘cannot  rate.* 

Please  make  sure  that  you  circle  only  one  response  for  each  example. 


Remember:  As  you  make  your  ratings,  ttrink  about;  soldiers  who  have 
about  24  months  of  service  in  this  MOS  after  Basic; and  AIT..  Also  keep 
in  Bind  all  that  you  know  about  the  full  range  of  duty  assignments  for 
this  MOS. 


Example: 


:  4^,.. ' 


Demonstrate  Leadership  —  Demonstrate  leadership  and  maturity.  Act  as  a  model,  give 
coon  and  instruction  to  peers;  support  peers’  and/or  provide  informal  counseling;  and  promote 

0  CNR 


direction  and  instruction  to  peers:  support  peers 
e  positive  public  image  of  the  miliiiry. 

L.  This  soldier  spent  many  duty  and  non-dnty  hours.*  f  <, 

learning  his  new  MOS.  In  a  few  months,  he.  was  top*. in  his 
MOS  and  wu  selected  as  the  first  E*4  to  evaluate  other  •  •  • 
soldiers  in  the  MOS.  -  ‘?!v  * 


U  M  A 


The  rater  read  the  definition  of  Demonstrate  Leadership  and  the  example  and  decided  that  a  soldier 
who  consistently  performed  like  this  example  would  be  darionstroting  outstanding  leadership. 
Therefore,  the  rater  circled  die  'CT  for  Outstanding. 
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D.  Vehicle  and  Equipment  Operations  —  Drive  or  operate  heavy 
mechanical  equipment. 


U  •  llntcctpub!* 
M  -  Mi'gin*) 

A  -  Acceptable 
O  ■  Outstanding 
CNR  -  Cannot  Rite 


L  While  this  soldier  was  driving  an  8  ton  goer  up  a  hill,  the  transmission 
locked.  When  the  soldier  tried  to  force  it  by  stepping  on  the  gas 
pedai,  the  engine  blew  up. 

2.  As  the  driver  of  an  M6QA1  on  a  road  march,  this  soldier  maintained 
the  proper  interval  between  his  vehicle  and  the  one  in  front  of  his,  and 
also  maneuvered  properly  through  different  types  of  terrain. 

3.  While  driving  an  M915  hauling  hazardous  cargo,  this  soldier  drove  the 
truck  through  a  tunnel. 

4.  During  a  tactical  road  march  on  an  ARTEP,  this  soldier's  tank  came 
under  enemy  fire.  He  quickly  and  successfully  maneuvered  the  tank  to 
a  safe  location  using  proper  terrain  features. 

5.  This  soldier  overloaded  the  hoist  capacity  and  was  reckless  when  the 
load  was  in  the  air.  His  actions  resulted  in  the  injury  of  one  man  and 
damage  to  the  vehicle  and  the  hoist. 

6.  While  driving  the  tank  to  the  wash  rack,  this  soldier  failed  to  use  a 
pound  guide.  He  hit  a  car  and  a  fence  while  he  was  backing  up. 

7.  While  driving  a  tractor  and  S000  gallon  tanker  on  an  icy  road,  the 
tanker  started  to  jack  knife.  The  soldier  carefully  steered  the  vehicle 
and  got  control  of  the  tanker  before  crashing 

8.  This  soldier  failed  to  use  a  rear  ground  guide  when  backing  up  the 
tank.  He  smashed  into  another  tank,  damaging  both  tanks. 

9.  While  delivering  cargo  to  soldiers  in  the  Held  at  night,  this  soldier’s 
vehicle  got  stuck.  "Hus  soldier  used  the  self  recovery  system  to  free 
the  vehicle. 

10.  This  soldier  did  not  hook  up  the  lifting  shackles  correctly  when  using  a 
wrecker  to  recover  a  jeep.  When  he  pulled  away  with  the  wrecker,  the 
jeep  tore  loose. 

1L  This  soldier  was  given  a  badge  for  driving  2  years  without  an  accident. 

12.  This  soldier,  while  driving  a  howitzer,  exceeded  the  safe  speed  and 
pivoted  the  gun  too  sharply.  He  hit  a  sidewalk,  causing  damage  to 
personal  property  as  well  as  the  gun. 

13.  This  soldier  was  assigned  to  recover  a  2  1/2  ton.  When  he  arrived  at 
the  disabled  vehicle,  he  hooked  up  the  tow  bar,  made  the  proper 
connections,  rigged  a  safety  chain  between  the  inside  of  the  bumper 
and  the  hoist  hook,  and  raised  the  vehicle  off  of  its  front  wheels.  The 
2  1/2  ton  was  successfully  towed  back  to  the  shop. 

14.  While  driving  a  1/4  ton  vehicle  on  commitment,  this  soldier  started  off 
in  second  gear. 


u 

M 

A 

0 

CNR 

u 

M 

A 

0 

CNR 

u 

M 

A 

0 
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u 

M 

A 
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u 
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A 

0 

CNR 

u 

M 

A 

O 
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D.  Vehicle  and  Equipment  Operations  —  Drive  or  operate  heavy 
mechanical  equipment. 


U  •  Unacceptable 
M  -  Marginal 
A  •  Acceptable 
O  •  Outstanding 

CNR  •  Cannot  Rata 


15. 

While  driving  his  tank  during  a  field  training  exercise,  this  soldier 
always  looked  for  the  best  route  to  travel  and  the  best  battle  positions 
to  park  the  tank. 

U 

M 

A 

O 

CNR 

16. 

This  soldier  was  sent  to  recover  a  1  1/4  ton  that  had  gone  over  on  its 
side  on  a  hill  He  rigged  the  vehicle  incorrectly  before  pulling  it, 
causing  about  S500.00  more  damage  than  the  accident  had  caused. 

U 

M 

A 

O 

CNR 

17. 

This  soldier  was  driving  too  fast  in  a  night  convoy  and  hit  the  vehicle 
ahead  of  him/her  when  he/she  rounded  a  curve  and  found  the  convoy 
had  stopped. 

U 

M 

A 

O 

CNR 

18. 

While  driving  across  an  open  field,  this  soldier  drove  into  a  swamp  and 
shifted  gears.  As  a  result,  his  tank  became  stuck  in  the  swamp 
and  had  to  be  pulled  out. 

U 

M 

A 

O 

CNR 

19. 

This  soldier  used  the  proper  passive  defense  procedures  when  he/she 
encountered  sniper  fire. 

U 

M 

A 

O 

CNR 

20. 

This  soldier  failed  to  move  his  howitzer  into  position.  This  resulted  in 
a  delay  for  the  entire  section. 

U 

M 

A 

O 

CNR 
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ML 


Individual  Combat  -  Engage  in  combat  and  survival  «Hlh;  know 
customs  and  laws  of  war. 


1.  When  assizned  to  be  flank  security  for  his  sauad.  this  soldier  moved  U  M  A  O  CNR 

back  into  the  formation  when  the  vegetation  became  too  thick  and 

moved  bade  out  when  it  thinned. 

2.  This  soldier  failed  to  lower  his  head  while  a  Claymore  mine  was  fired.  U  M  A  O  CNR 

He  was  injured  by  the  back  blast. 

3.  This  soldier  was  assigned  to  escort  a  new  prisoner  to  his  cell  in  the  U  M  A  O  CNR 

confinement  facility,  The  prisoner  had  cooperated  fully  with  all  of  the 

other  soldiers  and  authorities  at  the  confinement  facility.  However,  this 
soldier  verbally  abused  and  criticized  the  new  prisoner,  even  though  the 
prisoner  was  being  cooperative. 

4.  This  soldier's  unit  was  pinned  down  by  an  automatic  weapon  position.  TJ  M  A  O  CNR 

This  soldier  jumped  up  and  threw  a  grenade,  destroying  the  enemy 

position.  But  the  soldier  was  killed,  too. 

5.  This  soldier  got  lost  during  a  land  navigation  exercise.  A  search  parry  U  MAO  CNR 

found  Him  several  feiirkt  away  from  his  destination. 

-  6.  After  finding  a  soldier  who  had  been  exposed  to  a  nerve  agent,  this  U  M  A  O  CNR 

soldier  first  put  on  his  protective  mask,  and  administered  the  antidote 
to  himseif.  He  the”  masked  the  casualty,  administered  the  antidote  to 
the  patient  and  called  for  further  medical  assistance. 

7.  This  soldier  was  instructed  not  to  chamber  a  round  into  his  M16.  but  U  MAO  CNR 

did  so  anvway.  Later,  while  ou  patrol,  he  was  surprised  by  another 

soldier.  This  soldier  automatically  fired,  seriously  injuring  the  other 
soldier. 

8.  During  a  field  exercise,  soldier  failed  Co  perform  his  assigned  U  M  A  O  CNR 

duties  of  constructing  a  fighting  position,  establishing  a  listening  post 

and  installing  a  telephone. 

9.  While  performing  field  duties  at  a  dismount  point,  this  soldier  failed  to  U  M  A  O  CNR 

use  a  sign/countersign  challenge  when  an  individual  entered  the 

company  area. 

lfl.  Durum  this  soldier’s  tour  of  guard  duty,  the  Geld  officer  of  the  day  and  U  M  A  O  CNR 

the  officer  of  the  guard  approached.  This  soldier  aggressively 
challenged  them  and  displayed  thorough  knowledge  of  his 
responsibilities.  As  a  result,  the  company  was  helped  in  achieving 
outstanding  ratings. 

1L  Although  be  was  inexperienced  in  the  handling  of  explosives,  this  U  M  A  O  CNR 

soldier  claimed  he  had  worked  with  TNT.  He  accidentally  set  off  the 
TNT,  however,  killing  himself  and  an  NCOJC. 

12.  Because  this  soldier  did  not  know  how  to  disassemble  his  .45  caliber  U  M  A  O  CNR 

pistol,  another  soldier  had  to  help  him  do  it. 

13.  This  soldier  was  assigned  the  task  of  constructing  a  machicegun  U  M  A  O  CNR 

emplacement  on  the  perimeter.  He  constructed  the  position  using  good 

concealment  and  cover.  He  then  properly  filled  out  his  range  card 
with  the  correct  fields  of  lire  and  final  protective  line. 
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M. 


Individual  Combat  —  Enzaze  in  combat  and  survival  skills:  know 


14. 

On  an  FTX  this  soldier,  who  was  serving  as  the  compass  man,  took  the 
patrol  on  a  pre-planned  route  to  the  objective  rallying  point  This 
soldier  also  successfully  guided  the  patrol  back  to  the  point  of  origin. 

U 

M 

A 

0 

CNR 

15. 

Even  with  the  instructions  right  :n  front  of  him,  this  soldier  could  not 
decontaminate  his  skin  with  the  decontamination  kit 

U 

M 

A 

0 

CNP. 

16. 

This  soldier  searched  a  POW  in  a  Geld  environment  The  soldier 
failed  to  search  the  subject  below  the  waist  Consequently,  the  POW 
pulled  out  a  knife  and  lulled  the  soldier. 

U 

M 

A 

O 

CNR 

17. 

This  soldier  was  not  using  his  bayonet-rifle  aggressively  during  training 
and  probably  would  have  lost  a  confrontation  with  an  opponent 

U 

M 

A 

0 

CNR 

IS. 

When  this  soldier,  who  was  point  man  on  a  reconnaissance  patrol, 
noticed  a  shack  to  the  front  he  got  the  attention  of  the  leader  before 
the  patrol  walked  into  a  suspected  enemy  position. 

U 

M 

A 

0 

CNR 

19. 

This  soldier  observed  a  prisoner  attempting  to  escape  over  a  fence. 

The  soldier  ordered  the  prisoner  to  halt  and  was  able  to  apprehend 
the  prisoner  without  firing  any  shots. 

U 

M 

A 

0 

CNR 

20. 

Even  with  hours  of  practice  this  soldier  was  unable  to  throw  a  grenade 
through  a  window  from  more  than  10'  away. 

U 

M 

A 

0 
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Task-Based  Standard  Setting  Exercise 
Instructions  and  EXAMPLE 

In  this  exercise,  we  would  like  you  to  help  us  set  standards  for  performance  In  two  or  three  fairty  general 
areas.  These  areas  could  apply  to  more  than  one  MOS;  some  examples  are  Individual  Combat,  Vehicle 
and  Equipment  Operation,  and  Communicatloa 

There  are  two  major  steps  that  win  be  completed  for  each  task  area.  The  first  step  Involves  group  partici¬ 
pation,  while  the  second  step  is  completed  individually.  Refer  to  the  EXAMPLE  on  the  next  page  as  you 
read  through  the  steps  below. 

Step  1.  Read  the  Task  Area  Definition  and  the  Sample  Tasks  listed  there.  Under  the  'Yes/No*  column, 
circle  "Y*  B  you  think  the  Sample  Task  is  performed  In  the  MOS  you  are  rating;  circle  'N*  If  you 
think  it  Is  rjoj.  peformed  In  this  MOS.  If  you  circle  *N  *  try  to  think  of  a  task  that  Is  performed  in 
this  MOS  that  is  similar  to  the  Sample  Task  in  terms  of  the  type  of  operations  or  steps  involved, 
the  kinds  of  skills  required,  and  the  degree  of  difficulty  in  performing  the  task.  However,  do  not 
write  your  'substitute*  task  down  yet 

After  everyone  has  completed  this  part  of  the  step,  we  win  discuss  possible  substitute  tasks  (or 
the  group  may  decide  that  the  Sample  Task  really  does  occur).  After  this  discussion,  a  con¬ 
sensus  will  be  reached  about  the  best  substitute  tasks,  and  these  will  be  written  on  the  appro¬ 
priate  lines. 

Look  at  the  EXAMPLE.  A  group  of  63B  agreed  that  ‘Replace  transmission  rotor  hub  assembly* 
was  not  performed  In  their  MOS,  and  they  reached  a  consensus,  after  discussion,  that  'Re¬ 
place  hydrovac  In  a  5-Ton*  was  similar  In  terns  of  operations  performed,  skills  required,  and 
degree  of  difficulty  in  performing.  The  group  djd  think  the  other  two  Sample  Tasks  were  per¬ 
formed  in  the  63S  MOS.  so  the  *Y*  Is  circled  for  those  two  tasks,  and  no  substitutes  appear. 

Step  2.  After  agreeing  on  Sample  Tasks  or  substitutes,  you  wid  individually  complete  the  second  major 
step,  judging  what  should  be  the  test  score  cutoffs  on  these  tasks  in  order  to  be  viewed  as 
Marginal.  Acceptable,  or  Outstanding  performers  (using  the  Performance  Level  Definitions). 

To  help  make  judgments  for  the  second  step,  the  form  provides  Information  about  actual  soldier 
performance  on  hands-on  tests  of  the  Sample  Tasks.  This  test-score  information  is  not  based 
on  SOT  scores,  where  soldiers  are  allowed  to  practice  repeatedly.  The  hands-on  test  scores 
referred  to  here  are  from  specially-developed  tests  that  were  given  with  no  advance  warning 
end  no  practice  allowed. 

Look  at  the  EXAMPLE  again.  In  the  EXAMPLE,  34  out  of  100  soldiers  score  55  or  worse  on 
the  specially  developed  hands-on  tests  for  these  sample  tasks.  In  other  words.  34  out  of  1 00 
soldiers  could  correctly  perform  55%  or  fewer  of  the  steps  In  the  hands-on  tests. 

The  judge  m  this  example  (tedded  that  getting  less  than  55%  correct  on  these  tasks  was 
Unacceptable  and  drew  hts  line  marking  the  Unacceptable  category  below  55.  He  felt  that 
scores  less  than  75  were  Marginal;  75  and  above  Acceptable.  Finally,  he  felt  that  scores  of  95 
and  better  represent  Outstanding  performance.  Nine  out  of  100  soldiers  (100  minus  91)  would 
be  considered  outstanding  performers,  according  to  this  judge. 

PLEASE  put  your  name  and  the  MOS  you're  rating  in  the  spaces  provided  on  EVERY  page. 

NOTE:  As  yon  make  your  ratings,  think  about  soldiers  who  have  about  24  months  of  service  in  this  MOS  after 
Bask  and  AIT.  Also  keep  in  mind  all  that  you  know  about  the  full  range  of  duty  assignments  for  this  MOS. 
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EXAMPLE 


Task-Based  Standard  Setting  Form 


A.  Mechanical  Systems  Maintenance:  Inspect,  install,  maintain,  or  repair  mechanical  systems. 


Sample  Tasks 

1.  Perform  operator  maintenance 
on  M16A1  rifle. 


Part  of  the  MOS? 
YES/NO 


0N 


2.  Replace  transmission  in  rotor 
hub  assembly. 


3.  Replace  wheel  bearings. 


(7)n 


Actual  Hands-On  Test-Score  Information  for  these  Tasks: 


Test  Score 

%  OP  Steps  Correctly  Performed 


Number  of  Soldiers  Who  Score 
the  Same  or  Worse  Than  Thi 


100  out  of 
f 


82  out  of 
73  out  of 
63  out  of 
57  out  of 
51  out  of 
47  out  of 
42  out  of 
34  out  Of 
26  out  of 
25  out  of 
24  out  of 
23  out  of 
21  out  of 
16  out  of 
11  out  of 
lOoutof 
9  out  of 


100  solders 
100  soldiers 


100  solders 
100  soldiers 
100  soldiers 
100  soldiers 


100  soldiers 
100  soldiers 
100  soldiers 
100  soldiers 
100  soldiers 
100  solders 
100  solders 
100  solders 
100  solders 
100  solders 
100  solders 
100  solders 
100  solders 


DRAW  3  UNES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES. 

LABa  THE  CATEGORIES:  O  (Oust standing) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 


Name: _ 

MOS  You  Are  Rating: 


Task-Based  Standard  Setting  Form 

O.  Vehicle  and  Equipment  Operations:  Drive  or  operate  heavy  mechanical  equipment 


Samole  Tasks 

Part  of  the  MOS? 
YES/NO 

Substitute  Tasks 

1.  Start/stop  tank  engine. 

Y  N 

1. 

2.  Couple/uncouple  semitrailer. 

Y  N 

2. 

3.  Operate  tractor/semitrailer. 

Y  N 

3. 

Actual  Hands-On  Test-Score  Information  for  these  Tasks: 


Test  Score 

%  OF  Sleos  Correctly  Performed 

100 

95 

90 

85 

80 

75 

70 

65 

60 

55 

50 

45 

40 

35 

30 

25 

20 

15 

10 


Nu.nberof  Soldiers  Who  Score 
the  Same  or  Worse  Than  This 

ICO  out  of  100  soldiers 
87  out  of  100  soldiers 
78  out  of  100  soldiers 
70  out  of  lOOoOldiars 
62  out  of  100  soldiers 
51  out  of  100  soldiers 
39  out  of  100  soldiers 
32  out  of  100  soldiers 
24  out  oil  DO  soldiers 
18  out  of  100  soldiers 
11  out  of  100  soldiers 
9  out  of  100  soldiers 
6  out  of  100  soldiers 
5  out  of  100  soldiers 
3  out  of  100  soldiers 
2  out  of  100  soldiers 
1  out  of  100  soldiers 
1  out  of  100  soldiers 
1  out  of  100  soldiers 


DRAW  3  UNES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES 


LABEL  THE  CATEGORIES:  O  (Ouststanding) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 


Nairn: _ 

MOS  You  Are  Rating: 


Task-Based  Standard  Setting  Form 

M.  Individual  Combat:  Engage  in  combat  and  survival  skills:  know  customs  and  laws  of  war. 


1.  Engage  targets  with  grenades. 


Part  of  the  MOS? 

YES/NO  Substitute  Tasks 

Y  N  1. _ 


2.  Load,  reduce  stoppage,  and  Y  N  2.. 

dearan  M16A1  rifle. 

3.  Put  on  protective  dothing.  Y  N  3.. 


Actual  Hands-On  Test-Score  information  for  these  Tasks: 


Test  Score 

%  OF  Steps  Correctly  Performed 


Number  of  Soldiers  Who  Score 
the  Same  or  Worse  Than  This 


100 

95 

90 

85 

80 

75 

70 

65 

60 

55 

50 

45 

40 

35 

30 

25 

20 

15 

10 


100  out  of  100  soldiers 
78  out  of  100  soldiers 
63  out  of  100  soldiers 
52  out  of  100  soldiers 
40  out  of  100  soldiers 
32  out  of  100  soldiers 
23  out  of  100  soldiers 
19  out  of  100  soldiers 
14  out  of  100  solders 
11  out  of  100  soldiers 
8  out  of  100  solders 
7  out  of  100  soldiers 
5  out  of  100  solders 
4  out  of  100  solders 
2  out  of  100  solders 
2  oid  of  100  soldiers 
1  out  of  100  soldiers 
1  out  of  100  solders 
1  out  of  100  soldiers 


DRAW  3  UNES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES 


LABEL  THE  CATEGORIES:  O  (Outstanding) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 


Name: _ 

MOS  You  Are  Rating: 


Task-Based  Standard  Setting  Form  -  RERATE 

0.  Vehicle  and  Equipment  Operations:  Drive  or  operate  heavy  mechanical  equipment. 


Samole  Ta?k? 

Part  of  the  MOS? 
YES/NO 

Substitute  Tasks 

1.  Start/stop  tank  engine. 

Y  N 

1. 

1  Couple/uncouple  semitrailer. 

Y  N 

2. 

1.  Operate  tractor/semitrailer. 

Y  N 

3. 

Ictual  Hands-On  Test-Score  information  for  these  Tasks: 


Test  Score 

%  OF  Steos  Correctly  Performed 

100 

95 

90 

85 

80 

75 

70 

65 

60 

55 

50 

45 

40 

35 

30 

25 

20 

15 

10 


Number  ot  Soldiers  Who  Score 
the  Same  or  Worse  Than  This 

100  out  of  100  soldiers 
87  out  of  100  soldiers 
78  out  of  100  soldiers 
70  out  of  100  soldiers 
62  out  of  100  soldiers 
51  out  of  100  soldiers 
39  out  of  100  soldiers 
32  out  of  100  soldiers 
24  out  of  100  soldiers 
18  out  of  100  soldiers 
11  out  of  100  soldiers 
9  out  of  100  soldiers 
6  out  of  100  soldiers 
5  out  of  100  solders 
3  out  of  100  solders 
2  out  of  100  solders 
1  out  of  100  solders 
1  out  of  100  solders 
1  out  of  100  solders 


DRAW  3  UNES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES 

LABEL  THE  CATEGORIES:  O  (Ouststanding) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 


Name: _ 

MOS  You  Are  Rating: 

Task-Based  Standard  Setting  Form  -  RERATE 


IL  Individual  Combat:  Engage  In  combat  and  survival  skills;  know  customs  and  laws  of  war. 


Part  of  the  MOS? 

Sample  Tasks  YES/NO  Substitute  Tasks 

1.  Engage  targets  with  grenades.  Y  N  i, _ 


2.  Load,  reduce  stoppage,  and  Y  N  2.. 

dear  an  M16A1  rifle. 


3.  Put  on  protective  clothing. 


Y  N  3, 


Actual  Hands-On  Test-Score  information  for  these  Tasks: 


Test  Score 

%  OF  Steps  Correa  tv  Performed 

100 

95 

90 

85 

80 

75 

70 

66 

60 

55 

50 

45 

40 

35 

30 

25 

20 

15 

10 


Number  of  Soldiers  Who  Score 
the  Same  or  Worse  Than  This 

100  out  of  100  soldiers 
78  out  of  100  soldiers 
63  out  of  100  soldiers 
52  out  of  100  sokfiers 
40  out  of  100  solders 
32  out  of  100  soldiers 
23  out  of  100  soldiers 
19  out  of  100  soldiers 
14  out  of  100  soldiers 
11  out  of  100  soldiers 
8  out  of  100  solders 
7  out  of  100  solders 
5  out  of  100  solders 
4  out  oil  00  solders 
2  out  of  100  soldiers 
2  out  of  100  soldiers 
lout  of  100  sokfiers 
1  out  of  100  solders 
1  out  of  100  solders 


ORAW  3  UNES  THAT  MARK  THE  CUTOFFS  BETWEEN  THE  CATEGORIES 


LABEL  THE  CATEGORIES:  O  (Oust standing) 

A  (Acceptable) 

M  (Marginal) 

U  (Unacceptable) 


Name:. 


Task  Complexity  Questionnaire 

12B:  Combat  Engineer 


In  this  exercise,  we  would  like  you  to  provide  information  about  the  complexity  or  difficulty  of 
sample  tasks  selected  from  two  fairly  general  areas.  These  areas  could  apply  to  more  than  one 
MOS;  some  examples  are  Individual  Combat,  Vehicle  and  Equipment  Operation,  and  Commu¬ 
nication. 

For  each  of  the  two  tasks  presented,  there  are  10  questions  about  the  task.  For  several  questions, 
there  are  definitions  and  examples  to  clarify  the  meaning  of  the  question.  Please  read  all  defini¬ 
tions  and  examples  before  selecting  an  answer. 

NOTE:  If  the  sample  task  is  not  performed  in  the  MOS  you  are  rating,  please  use  the  substitute 
task  you  used  in  the  standard  setting  exercise. 


Task  Category:  D.  Vehicle  and  Equipment  Operations  -  Drive  or  operate  heavy  mechanical 
equipment 


Sample  Task:  Operate  tractor/semitrailer 

For  the  Vehicle  and  Equipment  Operations  task  listed  here,  please  answer  the  following  10 
questions.  The  answers  to  these  10  questions  will  provide  inform'  tion  on  the  complexity  of  the 
task  that  is  performed  by  soldiers  in  the  MOS  you  are  rating. 

Please  circle  the  most  appropriate  answer  to  each  question. 
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Sample  Task:  Operate  tractor/semitrailer 


L  Are  job  or  memory  aids  used  by  the  soldier  in  performing  this  task? 

a.  Yes 

b.  No  (Go  to  No.  3  if  you  answer  "No"  to  this  question) 

Job  and  memory  cuds  include  memory  joggers  learned  in  school  (e  g.,  S-A-L-V-T-E),  instruc¬ 
tions  printed  on  or  attached  to  equipment,  checklists  or  worksheets,  and  manuals  that  are  rou¬ 
tinely  used  while  performing  the  task. 

2.  How  would  you  rate  the  quality  of  the  job  or  memory  aid? 

a.  There  are  no  job  or  memory  aids  for  this  task. 

b.  Poor.  Even  with  the  job/memory  aid,  a  typical  soldier  would  need  a  great 
deal  of  additional  information. 

c.  Marginally  Good.  Even  with  the  job/memory  aid,  a  soldier  would  need 
important  additional  information. 

d.  Very  Good.  With  the  job/memory  aid,  a  soldier  would  need  only  a  little 
additional  information. 

e.  Excellent  Using  the  job/memory  aid,  a  soldier  can  do  the  entire  task  correctly 
with  no  additional  information  or  help. 

3.  Into  how  many  steps  is  this  task  typically  divided? 

a.  1  Step 

b.  2-5  Steps 

c.  6-10  Steps 

d.  More  than  10  Steps 

A  step  is  a  separate  physical  or  mental  activity  within  a  task  which  has  o  veil  defined,  observa 
ble  beginning  and  ending  point. 

4.  Are  the  steps  in  this  task  required  to  be  performed  in  a  definite  sequence? 

a.  The  tasks  typically  have  only  1  step. 

b.  None  are  required  to  be  performed  in  a  particular  sequence. 

c.  Some,  but  not  all  steps  must  be  performed  in  the  correct  sequence. 

d.  All  of  the  steps  must  be  performed  in  the  correct  sequence. 

5.  Does  the  task  provide  built-in  feedback  so  that  you  can  tell  if  you  are  doing  them 
correctly? 

a.  Built-in  feedback  is  provided  for  all  steps 

b.  Built-in  feedback  is  provided  for  most  steps  ( >  50%  ) 

c.  Built-in  feedback  is  provided  for  only  a  few  steps 

d.  No  Built-in  feedback  is  provided  for  any  steps. 

Examples  of  built-in  feedback  include  disassembling  equipment  where  removing  one  section 
automatically  uncovers  the  next  section;  steps  with  observable  effects  such  as  buzzers,  meter 
readings,  warning  lights;  and  operating  equipment  built  to  indicate  a  logical  progression  (for 
example,  adjusting  dials  from  left-to-right). 
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Sample  Task:  Operate  tractor/semitrailer 

6.  Does  the  task  or  parts  of  the  task  have  a  time  limit  for  its  completion? 

a.  There  are  no  time  limits 

b.  There  are  time  limits  that  are  fairly  easy  to  meet  under  test  conditions 

c.  There  are  rim*  limits  that  are  difficult  to  meet  under  test  conditions. 

7.  How  difficult  are  the  mental  processing  (thinking,  analyzing,  judging,  inferring,  and 
problem  solving)  requirements  of  this  task? 

a.  Almost  no  mental  processing  is  required  (physical  or  highly  repetitive  tasks) 

b.  Simple  mental  processing  is  required  (gross  comparisons,  simple  estimations  or 
calculations) 

c.  Complex  mental  processing  is  retired  (choices  or  decisions  based  on  subtle  but 
discrete  clues) 

d.  Very  complex  mental  processing  is  required  (rapid  decisions,  based  on  detailed 
information,  often  under  stress) 

8.  How  many  facts,  terms,  names,  rules,  or  Ideas  must  a  soldier  memorize  in  order  to 
do  this  task? 

a.  None  (or  all  are  provided  by  menory/job  aids) 

b.  A  few  (1-3) 

c.  Some  (4-8) 

d.  Very  Many  (more  than  8) 

9.  How  hard  are  the  facts  or  terms  that  must  be  remembered? 

a.  There  are  not  facts  or  terms  to  be  remembered 

b.  Not  at  all  hard  -  the  information  is  simple 

c.  Somewhat  hard  -  some  of  the  information  is  complex 

d.  Very  hard  -  the  facts,  rules,  and  terms  are  technical  or  specific  to  the  task  and 
must  be  remembered  in  exact  detail 

10.  What  are  the  motor  control  demands  of  this  task? 

a.  None 

b.  Small,  but  noticeable  degree  of  motor  control  is  required  (such  as  driving  a  nail, 
adjusting  a  dial) 

c.  Considerable  degree  of  motor  control  is  needed  (such  as  typing,  driving  a  manual 
shift  vehicle,  or  tracking  a  moving  target) 

d.  A  very  large  degree  of  motor  control  is  needed  (such  as  repair  of  delicate  equip¬ 
ment,  or  sending  Morse  code  using  a  key) 
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Task  Category:  M.  Individual  Combat  -  Engage  in  combat  and  survival  skills;  know  customs 
and  laws  ox  war. 


Sample  Task:  Put  on  Protective  Clothing 

For  the  Individual  Combat  task  listed  here,  please  answer  the  following  10  questions.  The 
answers  to  these  10  questions  will  provide  information  on  the  complexity  of  die  task  that  is  per¬ 
formed  by  soldiers  in  the  MOS  you  are  rating. 

Please  circle  the  most  appropriate  answer  to  each  question. 

L  Are  job  or  memory  aids  used  by  the  soldier  in  performing  this  task? 

a.  Yes 

b.  No  (Co  to  No.  3  if  you  answer  "No"  to  this  question) 

Job  and  memory  aids  include  memory  joggers  learned  in  school  (e.g.,  S-A-L-U-T-E),  instruc¬ 
tions  printed  on  or  attached  to  equipment,  checklist  or  worksheets,  and  manuals  that  are  rou¬ 
tinely  used  while  performing  the  task. 

2.  How  would  you  rate  the  quality  of  the  job  or  memory  aid? 

a.  There  are  no  job  or  memory  aids  for  this  task. 

b.  Poor.  Even  with  the  job/memory  aid,  a  typical  soldier  would  need  a  great 
deal  of  additional  information. 

c.  Marginally  Good.  Even  with  the  job/memory  aid,  a  soldier  would  need 
important  additional  information. 

d.  Very  Good.  With  the  job/memory  aid,  a  soldier  would  need  only  a  little 
additional  information. 

e.  Excellent.  Using  the  job/meraory  aid,  a  soldier  can  do  the  entire  task  correctly 
with  no  additional  information  or  help. 

3.  Into  how  many  steps  is  this  task  typically  divided? 

a.  1  Step 

b.  2-5  Steps 

c.  6-10  Steps 

d.  More  than  10  Steps 

A  step  is  a  separate  physical  or  mental  activity  within  a  task  which  has  a  well  defined,  observa¬ 
ble  beginning  and  ending  point. 

4.  Are  the  steps  in  this  task  required  to  be  performed  in  a  definite  sequence? 

a.  The  tasks  typically  have  only  !  step. 

b.  None  are  required  to  be  performed  in  a  particular  sequence. 

c.  Some,  but  not  all  steps  must  be  performed  in  the  correct  sequence. 

d.  All  of  the  steps  must  be  performed  in  th'.  correct  sequence. 

5.  Does  the  task  provide  built-in  feedback  so  that  you  can  tell  if  you  arc  doing  them 
correctly? 

a.  Built-in  feedback  is  provided  for  all  steps 

b.  Built-in  feedback  is  provided  for  most  steps  (  >  50%  ) 

c.  Built-in  feedback  is  provided  for  only  a  fev/  steps 

d.  No  Built-in  feedback  is  provided  for  any  steps. 
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Sample  Task:  Put  on  Protective  Clothing 

Examples  of  built-in  feedback  include  disassembling  equipment  where  removing  one  section 
automatically  uncovers  the  next  section;  steps  with  observable  effects  such  as  buzzers,  meter 
readings,  warning  lights;  and  operating  equipment  built  to  indicate  a  logical  progression  (for 
example,  adjusting  dials  from  left-to-right). 


6.  Does  the  task  or  parts  of  the  task  have  a  time  limit  for  its  completion? 

a.  There  are  no  time  limits 

b.  There  are  time  limits  that  are  fairly  easy  to  meet  under  test  conditions 

c.  There  are  Hme  limits  that  are  difficult  to  meet  under  test  conditions. 

7.  How  difficult  are  the  mental  processing  ((hinkir  g,  analyzing,  judging,  inferring,  and 
problem  solving)  requirements  of  this  task? 

a.  Almost  no  mental  processing  is  required  (physical  or  highly  repetitive  tasks) 

b.  Simple  mental  processing  is  required  (gross  comparisons  simple  estimations  or 
calculations) 

c.  Complex  mental  processing  is  required  (choices  or  decisions  based  on  subtle  but 
discrete  clues) 

d.  Very  complex  mental  processing  is  required  (rapid  decisions,  based  on  detailed 
information,  often  under  stress) 

8.  How  many  facts,  terms,  names,  rules,  or  ideas  must  a  soldier  memorize  in  order  to 
do  this  task? 

a.  None  (or  all  are  provided  by  memory/jcb  aids) 

b.  A  few  (1-3) 

c.  Some  (4-8) 

d.  Very  Many  (more  than  8) 

9.  How  hard  are  the  facts  or  terms  that  must  be  remembered? 

a.  There  are  not  facts  or  terms  to  be  remembered 

b.  Not  at  all  hard  -  the  information  is  simple 

c  Somewhat  hard  -  some  of  the  information  is  complex 

d.  Very  hard  -  the  facts,  rules,  and  terms  are  technical  or  specific  to  the  task  and 

must  be  remembered  in  exact  detail. 

10.  What  are  the  motor  control  demands  of  this  task? 

a.  None 

b.  Small,  but  noticeable  degree  of  motor  control  is  required  (such  as  driving  a  nail, 
adjusting  a  dial) 

c.  Considerable  degree  of  motor  control  is  needed  (such  as  typing,  driving  a  manual 
shift  vehicle,  or  tracking  a  moving  target) 

d.  A  very  large  degree  of  motor  control  is  needed  (such  as  repair  of  delicate  equip¬ 
ment,  or  sending  Morse  code  using  a  key) 
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Army  Task  Questionnaire:  Mean  Ratings  of 
Each  Component  by  MOS  and  Rating  Scales 


Table  B.l 

Army  Task  Questionnaire  Mean  Ratings:  12B  -  Combat  Engineer 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N-77  N-77  N-77  N-77  N-77 


Handle  demolitions/mines 
Operate  wheeled  vehicle 
Perform  op  maint  chcks/svcs 
Fire  individual  weapons 
Survive  in  the  field 
Use  maps 

Place/camoufl  tact  equip/mater 
Move/react  in  the  field 
Send/receive  radio  messages 
Protect  against  NBC  hazards 
Perform  op  chcks/avcs  on  weap 
Navigate 
Act  as  a  model 

Read  tech  manl/field  manl/etc 
Give  directions/instructions 
Sketch  maps /overlays /range  cards 
Communicate 

Give  short  oral  reports 

Give  first  aid 

Lead 

Operate  track  vehicle 
Use  hand  &  arm  signals 
Monitor/inspect 
Counsel 

Use  hand  grenades 
Detect /identify  targets 
Train 

Direct /lead  teams 
Engage  in  hand-to-hand  combat 
Plan  placement/use  tact  equip 
Control  individuals/crowds 
Pack/load  materials 
Know  customs /laws  of  war 
Construct  wooden  bldgs/struct 
Paint 

Plan  operations 
Decode  data 

Compute  statistics/other  math 


4.13 

4.54 

2.98 

4.27 

3.68 

3.97 

3.67 

3.57 

3.93 

2.76 

3.94 

3.71 

3.98 

4.13 

2.71 

3.90 

3.87 

4.37 

4.44 

3.01 

3.90 

3.81 

4.13 

4.31 

3.37 

3  -  89 

4.20 

4.07 

4.37 

3.26 

3.81 

4.05 

3.71 

4.07 

3.14 

3.76 

3.63 

3.92 

4.07 

3.24 

3.71 

3.75 

4.07 

4.06 

2.94 

3.66 

3.54 

4.22 

4.20 

3.45 

3.63 

3.69 

4.30 

4.27 

2.59 

3.54 

3.84 

4.13 

4.31 

3.67 

3.54 

3.37 

3.94 

4.01 

3.33 

3.49 

3.71 

3.70 

3.89 

2.72 

3.33 

3.42 

3.51 

3.63 

2.98 

3.24 

3.45 

3.32 

3.75 

3.29 

3.23 

3.10 

3.53 

3.58 

3.05 

3.20 

3.27 

3.37 

3.48 

2.93 

3.16 

3.46 

4.22 

4.13 

3.42 

3.14 

3.42 

3.63 

3.81 

3.54 

2.88 

2.72 

2.39 

2.97 

2.58 

2.80 

2.84 

3.05 

3.20 

2.31 

2.75 

2.87 

3.07 

3.20 

2.81 

2.71 

2.55 

3.06 

3.09 

3.01 

2.71 

3.07 

3.63 

3.62 

2.23 

2.67 

2.81 

3.18 

3.32 

3.10 

2.67 

2.80 

3.07 

3.10 

2.93 

2.55 

3.00 

3.03 

3.35 

3.41 

2.37 

2.59 

2.87 

2.98 

3.11 

2.33 

2.78 

2.60 

3.02 

2.87 

2.32 

2.32 

2.78 

2.88 

2.56 

2.27 

2.11 

2.35 

2.55 

2.63 

2.26 

2.39 

2.93 

2.96 

2.59 

2.22 

2.87 

1.50 

2.55 

3.29 

1.98 

1.23 

1.31 

1.36 

1.26 

1.98 

2.28 

2.31 

2.41 

3.01 

1.98 

2.50 

2.63 

2.70 

3.06 

1.97 

2.53 

1.75 

2.45 

2.83 

(table  continues) 


B-l 


Table  B.l  (continued) 


Task  Categories 

FRE 

N-77 

CTI 

N-77 

Rating 

GSI 

N-77 

OJI 

N-77 

DIF 

N-77 

Operate  power  excavating  equip 

1.94 

2.54 

1.39 

2.27 

2.70 

Assemble  steel  structures 

1.90 

2.57 

1.51 

2.34 

2.94 

Install  electronic  components 

1.89 

1.96 

2.22 

2.20 

2.07 

Operate  gas /electric  power  equip 

1.76 

2.03 

1.44 

1.96 

2.24 

Provide  counseling 

1.64 

1.62 

1.80 

1.90 

1.86 

Personnel  Administration 

1.58 

1.45 

1.83 

1.88 

2.05 

Operate  electronic  equipment 

1.51 

1.85 

2.14 

2.19 

1.90 

Operate  \ jats 

1.44 

1.75 

1.10 

1.70 

2.24 

Conduct  land  surveys 

1.36 

1.71 

1.20 

1.63 

2.01 

Construct  masonry  bldgs/struct 

1.33 

1.97 

1.02 

1.79 

2.79 

Repair  mechanical  systems 

1.33 

1.51 

1.63 

1.68 

2.40 

Install  wire/cables 

1.28 

1.42 

1.50 

1.55 

1.42 

Order  equipment/ supplies 

1.14 

1.31 

1.33 

1.51 

1.55 

Troubleshoot  weapons 

1.06 

1.44 

1.64 

1.59 

1.88 

Prep  technical  forms /documents 

0.93 

1.11 

1.14 

1.31 

1.50 

Operate  lift /load/grade  equip 

0.92 

1.14 

0.63 

1.07 

1.90 

Record/file/dispatch  information 

0.88 

0.85 

1.03 

1.09 

1.37 

Receive/store/issue  supp/equip 

0.87 

0.93 

1.11 

1.19 

1.26 

Analyze  weather  conditions 

0.85 

0.93 

0.90 

1.00 

1.18 

Draw  maps /overlays 

0.75 

0.77 

0.72 

0.84 

1.24 

Write/deliver  presentations 

0.74 

0.86 

0.88 

0.96 

1.39 

Prep  equip/supplies  for  air  drop 

0.74 

1.00 

0.85 

1.05 

1.48 

Type 

0.72 

0.61 

0.80 

0.71 

1.50 

Troubleshoot  mechanical  systems 

0.70 

1.02 

0.96 

1.09 

1.36 

Write  documents /correspondence 

0.62 

0.70 

0.83 

0.85 

1.14 

Determine  fire  data-indirect  weap 

0.59 

0.72 

0.83 

0.87 

1.01 

Analyze  intelligence  data 

0.59 

0.80 

0.83 

0.84 

1. 14 

Draw  illustrations 

0.49 

0.58 

0.44 

0.54 

0.88 

Use  audiovisual  equipment 

0.45 

0.45 

0.50 

0.58 

0.67 

Repair  weapons 

0.41 

0.54 

0.68 

0.62 

0.88 

Repair  metal 

0.40 

0.54 

0.25 

0.44 

0.97 

Reproduce  printed  material 

0.40 

0.31 

0.42 

0.41 

0.36 

Fire  heavy  direct  fire  weapons 

0.35 

0.46 

0.46 

0.53 

0.64 

Produce  technical  drawings 

0.32 

0.44 

0.20 

0.33 

0.98 

Interview 

0.31 

0.38 

0.35 

0.42 

0.63 

Estimate  time/cost  of  maint  ops 

0.28 

0.37 

0.29 

0.39 

0.53 

Operate  computer  hardware 

0.27 

0.29 

0.31 

0.36 

0.84 

Install  pipe  assemblies 

0.27 

0.39 

0.18 

0.35 

0.71 

Prepare  parachutes 

0.24 

0.32 

0.35 

0.39 

0.57 

Inspect  electrical  systems 

0.24 

0.45 

0.41 

0.42 

0.50 

Repair  electrical  systems 

0.23 

0.33 

0.33 

0.32 

0.55 

Repair  plastic/fiberglass 

0.14 

0.73 

0.18 

0.22 

0.45 

Control  money 

0.14 

0.10 

0.09 

0.11 

0.16 

Fire  indirect  fire  weapons 

0.14 

0.15 

0.19 

0.18 

0.22 

Receive  clients/patients/guests 

0.13 

0.13 

0.11 

0.11 

0.14 

(table  continues) 


Table  B.l  (continued) 


Task  Categories 


Repair  electronic  components 
Analyze  electronic  signals 
Control  air  traffic 
Inspect  electronic  system* 

Prep  heavy  weap  for  tactical  use 
Provide  medical /den cal  treatment 
Load/unload  artillery/tank  guns 
Provide  programming/DP  support 
Translate  foreign  languages 
Cook 

Operate  radar 

Perform  medical  lab  procedures 
Select/lay/clean  med/dent  equip 


Rating 


FRE 

N-77 

CTI 

N=77 

GSI 

N-77 

OJI 

N-77 

DIF 

N-77 

0.13 

0.14 

0.14 

0.15 

0.28 

0.10 

0.06 

0.13 

0.13 

0.28 

0.10 

0.11 

0.14 

0.14 

0.18 

0.09 

0.14 

0.19 

0.18 

0.19 

0.09 

0.09 

0.09 

0.07 

0.13 

0.06 

0.06 

0.06 

0.05 

0.10 

0.03 

0.05 

0.07 

0.07 

0.11 

0.03 

0.03 

0.03 

0.03 

0.13 

0.03 

0.06 

0.05 

0.05 

0.19 

0.01 

0.00 

0.01 

0.01 

0.02 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note.  FRE  *  Frequency 

CTI  •  Core  Tech' i z  i  Importance 
GSI  -  Genera1  Soldiering  Importance 
OJI  »  Ove  * lxi  Importance 
DIF  -  Diiriculty 

N  -  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 


Table  B.2 


Army  Task  Questionnaire  Mean  Ratings :  13B  -  Cannon  Crewman 


Task  Categories 

FRE 

N=69 

CTI 

N=69 

Ratinq 

GSI 

N=69 

OJI 

N=69 

DIF 

N=66 

Load/unload  artillery/ tank  guns 

4.55 

4.72 

2.66 

4.10 

2.97 

Fire  indirect  fire  weapons 

4.52 

4.63 

2.73 

4.18 

3.42 

Prep  heavy  weap  for  tactical  use 

4.34 

4.59 

2.73 

4.08 

3.22 

Place/camouf 1  tact  equip/mater 

4.11 

3.98 

3.73 

3.95 

2.83 

Operate  wheeled  vehicle 

4.05 

4.04 

3.66 

4.08 

2.68 

Perform  op  maint  chcks/svcs 

3.97 

4.26 

3.84 

4.26 

3.15 

Perform  op  chcks/svcs  on  weap 

3.72 

4.33 

4.15 

4.36 

3.21 

Protect  against  NBC  hazards 

3.65 

3.92 

4.44 

4.39 

3.34 

Survive  in  the  field 

3.62 

3.94 

4.13 

4.17 

3.28 

Read  tech  manl/field  manl/etc 

3.60 

3.79 

3.68 

3.82 

2.84 

Fire  individual  weapons 

3.56 

3.81 

4.42 

4.30 

2.86 

Use  maps 

3.36 

3.63 

3.85 

3.84 

3.27 

Sketch  maps /overlays /range  cards 

3.29 

3.36 

3.37 

3.47 

3.00 

Act  as  a  model 

3.25 

3.25 

3.51 

3.61 

3.09 

Operate  track  vehicle 

3.18 

3.37 

2.78 

3.20 

2.34 

Use  hand  &  arm  signals 

3.17 

3.40 

3.21 

3.34 

2.33 

Lead 

3.05 

3.01 

3.23 

3.34 

2.77 

Pack/load  materials 

2.98 

3.07 

2.65 

3.08 

2.59 

Give  first  aid 

2.97 

3.34 

4.24 

4.14 

3.39 

Give  directions/instructions 

2.97 

3.08 

3.26 

3.29 

2.47 

Counsel 

2.76 

2.69 

3.04 

3.07 

2.86 

Train 

2.70 

2.86 

2.91 

3.01 

2.53 

Communicate 

2.69 

2.95 

3.20 

3.23 

2.67 

Navigate 

2.69 

3.43 

3.62 

3.63 

3.36 

Know  customs /laws  of  war 

2.59 

2.42 

3.02 

2.98 

2.56 

Give  short  oral  reports 

2.55 

2.71 

3.21 

3.05 

2.30 

Monitor/inspect 

2.52 

2.60 

2.81 

2.87 

2.53 

Move/react  in  the  field 

2.50 

2.91 

3.29 

3.20 

2.56 

Send/receive  radio  messages 

2.49 

2.89 

3.27 

3.21 

2.71 

Control  individuals /crowds 

2.49 

2.30 

2.91 

2.72 

2.24 

Fire  heavy  direct  fire  weapons 

2.46 

2.92 

1.92 

2.60 

2.24 

Install  wire/cables 

2.43 

2.53 

2.20 

2.60 

1.69 

Troubleshoot  weapons 

2.40 

3.40 

3.04 

3.36 

3.23 

Detect/identify  targets 

2.23 

2.85 

2.98 

3.01 

2.83 

Install  electronic  components 

2.16 

2.30 

2.38 

2.54 

2.25 

Paint 

2.14 

1.39 

1.50 

1.52 

1.22 

Use  hand  grenades 

2.01 

2.71 

3.40 

3.24 

2.10 

Repair  mechanical  systems 

2.00 

2.98 

2.46 

2.75 

3.01 

Plan  placement/use  tact  equip 

1.92 

2.42 

2.44 

2.41 

2.21 

Direct /lead  teams 

1.84 

2.21 

2.27 

2.29 

1.93 

Personnel  Administration 

1.77 

1.88 

1.91 

2.02 

1.92 

Order  equipment/supplies 

1.71 

1.94 

2.15 

2.18 

2.40 

(table  continues) 


Table  B.2  (continued) 


Task  Categories 


Handle  demolitions /mines 

Provide  counseling 

Prep  technical  forms /documents 

Troubleshoot  mechanical  systems 

Repair  weapons 

Operate  electronic  equipment 

Determine  fire  data-indirect  weap 

Engage  in  hand-to-hand  combat 

Conduct  land  surveys 

Decode  data 

Plan  operations 

Receive/store/issue  supp/equip 
Record/file/dispatch  information 
Prep  equip/supplies  for  air  drop 
Compute  statistics /other  math 
Inspect  electrical  systems 
Operate  gas /electric  power  equip 
Write  documents /correspondence 
Type 

Repair  electrical  systems 
Inspect  electronic  systems 
Reproduce  printed  material 
Analyze  weather  conditions 
Interview 

Operate  computer  hardware 
Analyze  intelligence  data 
Write/deliver  presentations 
Repair  electronic  components 
Draw  rnapk,  'overlays 
Use  audiovisual  equipment 
Repair  metal 
Control  money 
Prepare  parachutes 
Construct  wooden  bldgs /struct 
Estimate  tiroe/cost  of  maint  ops 
Assemble  steel  structures 
Receive  clients/patients/guests 
Provide  programming/DP  support 
Draw  illustrations 
Operate  lift/load/grade  equip 
Provide  medical /dental  treatment 
Repair  plastic/fiberglass 
Control  air  traffic 
Install  pipe  assemblies 


Rating 


FRE 

N-69 

CTI 

N-69 

GSI 

N-69 

OJI 

N-69 

DIP 

N-66 

1.66 

2.10 

2.23 

2.30 

2.31 

1.62 

1.81 

1.98 

2.05 

1.90 

1.60 

2.02 

1.65 

1.92 

2.18 

1.58 

2.24 

1.89 

2.05 

2.40 

1.56 

2.53 

2.01 

2.40 

2.51 

1.55 

2.01 

2.10 

2.20 

2.01 

1.52 

1.89 

1.31 

1.87 

1.98 

1.46 

1.56 

2.07 

2.00 

2.37 

1.43 

1.77 

1.55 

1.76 

1.84 

1.42 

2.07 

2.30 

2.30 

2.59 

1.24 

1.47 

1.52 

1.58 

1.80 

1.07 

1.14 

1.25 

1.26 

1.30 

1.07 

1.14 

1.33 

1.33 

1.45 

1.02 

1.31 

0.97 

1.29 

1.47 

0.95 

1.15 

1.05 

1.23 

1.12 

0.82 

1.07 

1.00 

1.02 

1.38 

0.78 

0.94 

0.84 

0.94 

1.16 

0.63 

0.73 

0.89 

0.94 

1.19 

0.63 

0.65 

0.68 

0.84 

1.18 

0.58 

0.70 

0.70 

0.76 

1.13 

0.57 

0.69 

0.67 

0.76 

1.06 

0.56 

0.40 

0.52 

0.49 

0.60 

0.52 

0.55 

0.57 

0.60 

0.84 

0.50 

0.71 

0.85 

0.76 

0.72 

0.49 

0.49 

0.53 

0.55 

0.90 

0.45 

0.57 

0.67 

0.60 

0.76 

0.44 

0.53 

0.32 

0.60 

0.77 

0.44 

0.44 

0.47 

0.55 

0.83 

0.43 

0.59 

0.68 

0.68 

0.74 

0.37 

0.30 

0.44 

0.37 

0.47 

0.34 

0.37 

0.39 

0.38 

0.65 

0.33 

0.39 

0.40 

0.40 

0.50 

0.31 

0.46 

0.44 

0.47 

0.51 

0.31 

0.31 

0.34 

0.30 

0.43 

0.29 

0.30 

0.29 

0.30 

0.45 

0.26 

0.29 

0.36 

0.32 

0.35 

0.73 

0.18 

0.27 

0.24 

0.27 

0.22 

0.27 

0.27 

0.30 

0.27 

0.20 

0.22 

0.23 

0.25 

0.23 

0.20 

0.24 

0.24 

0.24 

0.33 

0.18 

0.24 

0.26 

0.26 

0.25 

0.17 

0.18 

0.17 

0.20 

0.33 

0.15 

0.23 

0.21 

0.24 

0.31 

0.15 

0.13 

0.18 

0.17 

0.18 

(table  continues) 


B-5 


Table  B . 2  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N-69  N=69  N»69  N-69  N-66 


Cook 

Construct  masonry  bldgs/struct 
Operate  power  excavating  equip 
Operate  radar 
Analyze  electronic  signals 
Translate  foreign  languages 
Perform  medical  lab  procedures 
Produce  technical  drawings 
Select/lay/clean  med/dent  equip 
Operate  boats 


0.14 

0.18 

0.26 

0.24 

0.19 

0.11 

0.14 

0.13 

0.14 

0.19 

0.10 

0.13 

0.15 

0.13 

0.18 

0.08 

0.13 

0.10 

0.11 

0.22 

0.08 

0.08 

0.10 

0.10 

0.12 

0.08 

0.11 

0.13 

0.13 

0.25 

0.07 

0.10 

0.11 

0.11 

0.12 

0.07 

0.08 

0.13 

0.11 

0.16 

0.05 

0.04 

0.07 

0.07 

0.04 

0.00 

0.00 

0.00 

0.00 

0.00 

Note.  FRE  *  Frequency 

CTI  *  Core  Technical  Importance 
GSI  *  General  Soldiering  Importance 
OJI  *  Overall  importance 
DIF  *  Difficulty 

N  -  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 


B-6 


Table  B.3 

Army  Task  Questionnaire  Mean  Ratings (  27E  —  TOW/ Dragon  Repairer 


Task  Categories 


Inspect  electronic  systems 
Read  tech  manl/£ield  manl/etc 
Inspect  electrical  systems 
Repair  electrical  systems 
Repair  electronic  components 
Perform  op  maint  chcks/svcs 
Act  as  a  model 
Troubleshoot  weapons 
Install  electronic  components 
Operate  wheeled  vehicle 
Repair  mechanical  systems 
Operate  electronic  equipment 
Communicate 
Lead 

Repair  weapons 

Monitor/inspect 

Give  directions /instructions 

Counsel 

Fire  individual  weapons 
Protect  against  NBC  hazards 
Prep  technical  forms /documents 
Train 

Perform  op  chcks/svcs  on  weap 
Order  equipment/supplies 
Survive  in  the  field 
Give  first  aid 
Use  maps 

Personnel  Administration 
Troubleshoot  mechanical  systems 
Send/receive  radio  messages 
Repair  plastic/fiberglass 
Record/file/dispatch  information 
Place/camoufl  tact  equip/mater 
Sketch  maps /overlays /range  cards 
Navigate 
Plan  operations 

Operate  gas/electric  power  equip 
Know  customs /laws  of  war 
Plan  placement/use  tact  equip 
Receive/store/issue  supp/equip 
Move/react  in  the  field 
Give  shcrt  oral  reports 


FRE 

N-34 

CTI 

N-34 

GSI 

N-34 

OJI 

N-34 

DIF 

N-33 

4.23 

4.38 

1.73 

3.64 

3.54 

4.23 

4.41 

3.76 

4.08 

2.75 

3.94 

4.02 

1.82 

3.47 

3.39 

3.94 

4.11 

1.79 

3.47 

3.33 

3.88 

4.26 

1.76 

3.61 

3.36 

3.67 

3.58 

3.79 

3.85 

2.87 

3.61 

3.55 

4.02 

4.02 

3.06 

3.55 

3.70 

2.20 

3.41 

3.09 

3.52 

3.73 

2.00 

3.35 

3.00 

3.47 

3.55 

3.52 

3.76 

2.36 

3.47 

3.55 

2.02 

3.05 

2.97 

3.44 

4.00 

2.64 

3.61 

3.18 

3.26 

3.35 

3.70 

3.73 

2.81 

3.18 

3.09 

3.45 

3.45 

2.87 

3.17 

3.11 

1.85 

2.82 

2.84 

3.12 

3.30 

3.27 

3.36 

2.81 

2.88 

2.91 

3.00 

3.02 

2.54 

2.88 

2.64 

3.14 

3.14 

2.51 

2.85 

2.05 

4.26 

4.08 

2.54 

2.79 

2.73 

4.23 

4.23 

3.15 

2.73 

3.00 

1.97 

2.76 

2.45 

2.70 

2.85 

3.14 

3.29 

2.60 

2.61 

.67 

4.11 

3.88 

2.69 

2.47 

<7 

1.76 

2.73 

2.45 

2.44 

l .  29 

3.76 

3.64 

2.97 

2.42 

2.15 

3.93 

3.81 

3.19 

2.35 

2.70 

3.81 

3.67 

3.06 

2.08 

2.11 

2.44 

2.55 

2.00 

2.05 

1.97 

1.23 

1.85 

1.90 

2.02 

2.14 

3.39 

3.20 

2.48 

2.00 

2.45 

1.27 

2.00 

1.90 

1.88 

2.05 

1.61 

2.05 

2.06 

1.82 

1.94 

2.55 

2.50 

1.90 

1.82 

1.58 

2.91 

2.73 

2.84 

1.79 

1.97 

3.00 

3.00 

2.57 

1.78 

2.12 

2.36 

2.39 

2.15 

1.76 

2.26 

1.64 

2.20 

1.97 

1.64 

1.02 

2.67 

2.52 

2.06 

1.64 

1.56 

2.50 

2.37 

2.03 

1.61 

1.76 

1.26 

1.70 

1.66 

1.61 

1.21 

2.69 

2.57 

2.09 

1.58 

1.23 

2.32 

2.11 

1.75 

f table  continues) 
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Table  B . 3  ( continued ) 


Rating 


Task  Categories 

FRE 

N-34 

CTI 

N-34 

GSI 

N-34 

OJI 

N-34 

DIF 

N-33 

Detect/identify  targets 

1.57 

1.18 

2.72 

2.51 

2.18 

Pack/load  materials 

1.52 

1.73 

1.94 

2.05 

2.00 

Direct/lead  teams 

1.51 

1.69 

2.27 

2.27 

1.90 

Provide  counseling 

1.38 

1.51 

1.78 

1.87 

1.59 

Estimate  time/cost  of  maint  ops 

1.38 

1.67 

0.97 

1.52 

1.45 

Control  individuals /crowds 

1.29 

0.52 

1.85 

1.73 

1.69 

Paint 

1.23 

1.35 

1.23 

1.50 

1.18 

Install  wire/cables 

1.23 

1.55 

1.23 

1.41 

1.54 

Operate  computer  hardware 

1.23 

1.47 

0.94 

1.44 

2.03 

Use  hand  &  arm  signals 

1.14 

0.79 

2.09 

1.90 

1.48 

Use  hand  grenades 

0.94 

0.70 

2.41 

2.14 

1.51 

Type 

0.88 

0.79 

0.79 

0.97 

1.33 

Write  documents /correspondence 

0.88 

0.97 

0.91 

1.00 

1.18 

Write/deliver  presentations 

0.76 

0.78 

0.97 

1.00 

0.96 

Compute  statistics /other  math 

0.73 

1.17 

0.76 

1.11 

1.42 

Operate  track  vehicle 

0.70 

0.88 

0.82 

0.94 

1.18 

Decode  data 

0 .  b4 

0.85 

1.26 

1.26 

1.30 

Repair  metal 

0.64 

0.91 

0.55 

0.91 

1.06 

Engage  in  hand-to-hand  combat 

0.61 

0.29 

1.23 

1.05 

1.39 

Use  audiovisual  equipment 

0.58 

0.47 

0.55 

0.64 

0.66 

Conduct  land  surveys 

0.55 

0.55 

1.14 

1.08 

0.97 

Prep  equip/supplies  for  air  drop 

0.50 

0.44 

0.67 

0.64 

0.78 

Reproduce  printed  material 

0.47 

0.35 

0.38 

0.47 

0.48 

Handle  demolitions /mines 

0.47 

0.29 

0.91 

0.85 

1.00 

Prep  heavy  weap  for  tactical  use 

0.44 

0.55 

0.64 

0.64 

0.57 

Fire  heavy  direct  fire  weapons 

0.41 

0.41 

0.55 

0.61 

0.60 

Receive  clients /patients /guests 

0.32 

0.26 

0.23 

0.35 

0.24 

Interview 

0.32 

0.38 

0.29 

0.41 

0.42 

Draw  maps /overlays 

0.26 

0.17 

0.20 

0.20 

0.27 

Draw  illustrations 

0.26 

0.3? 

*.29 

0.38 

0.36 

Provide  programming/DP  support 

0.20 

0.2u 

0.32 

0.35 

0.39 

Produce  technical  drawings 

0.17 

0.17 

0.05 

0.14 

0.21 

Analyze  intelligence  data 

0.14 

0.20 

0.38 

0.32 

0.33 

Determine  fire  data-indirect  weap 

0.14 

0.05 

0.17 

0.08 

0.06 

Analyze  electronic  signals 

0.14 

0.17 

0.11 

0.20 

0.21 

Construct  wooden  bldgs /struct 

0.14 

0.05 

0.14 

0.11 

0.24 

Translate  foreign  languages 

0.08 

0.05 

0.05 

0.08 

0.12 

Analyze  weather  conditions 

0.08 

0.11 

0.23 

0.23 

0.21 

Operate  radar 

0.08 

0.14 

0.08 

0.11 

0.09 

Prepare  parachutes 

0.08 

0.02 

0.14 

0.14 

0.18 

Assemble  steel  structures 

0.05 

0.05 

0.05 

0.08 

0.15 

Operate  lift/load/grade  equip 

0.05 

0.05 

0.05 

0.05 

0.18 

Control  money 

0.02 

0.02 

0.02 

0.02 

0.00 

Select/ lay/clean  med/den t  equip 

0.02 

0.05 

0.00 

0.05 

0.06 

(table  continues ) 
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Table  B.3  (continued) 


_ Bating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N-34  N-34  N-34  N-34  N-33 


Operate  power  excavating  equip 
Load/unload  artillery/ tank  guns 
Fire  indirect  fire  weapons 
Cook 

Operate  boats 
Control  air  traffic 
Provide  medical /dental  treatment 
Install  pipe  assemblies 
Perform  medi:l  lab  procedures 
Construct  masonry  bldgs/struct 


0.02 

0.00 

0.00 

0.00 

0.00 

0.02 

0.02 

0.08 

0.08 

0.09 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note.  FRE  ■  Frequency 

CTI  *  Core  Technical  Importance 
GSI  -  General  Soldiering  Importance 
OJI  ■  Overall  Importance 
DIF  «  Difficulty 

N  ■  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 


Table  B.4 

Army  Task  Questionnaire  Mean  Ratings i  29E  -  Radio  Repairer 


Task  Categories 


Inspect  electronic  systems 
Operate  electronic  equipment 
Repair  electronic  components 
Repair  electrical  systems 
Install  electronic  components 
Read  tech  manl/field  manl/etc 
Inspect  electrical  systems 
Perform  op  maint  chcks/svcs 
Act  as  a  model 
Communicate 

Operate  wheeled  vehicle 

Prep  technical  forms /documents 

Send/receive  radio  messages 

Lead 

Train 

Counsel 

Fire  individual  weapons 
Perform  op  chcks/svcs  on  weap 
Use  maps 

Order  equipment/supplies 

Protect  against  NBC  hazards 

Give  directions/instructions 

Operate  gas /electric  power  equip 

Monitor/inspect 

Give  first  aid 

Survive  in  the  field 

Install  wire/cables 

Navigate 

Personnel  Administration 
Receive/store/issue  supp/equip 
Pack/ load  materials 
Record/file/dispatch  information 
Estimate  time/cost  of  maint  ops 
Repair  mechanical  systems 
Move/react  in  the  field 
Give  short  oral  reports 
Know  customs/laws  of  war 
Provide  counseling 
Type 

Opera ce  computer  hardware 
Use  hand  &  arm  signals 
Control  individuals/crowds 


Rating 


FRE 

N=49 

CTI 

N=49 

GSI 

N=49 

OJI 

N~49 

DIF 

N=48 

4.44 

4.71 

1.95 

3.91 

3.85 

4.40 

4.40 

2.71 

4.00 

3.12 

4.32 

4.65 

1.89 

4.00 

3.91 

4.28 

4.42 

1.95 

3.85 

3.77 

4.22 

4.46 

2.40 

3.83 

3.16 

4.08 

4.49 

3.49 

4.10 

3.00 

4.06 

4.46 

1.89 

3.73 

3.52 

3.20 

2.87 

3.38 

3.61 

2.41 

3.10 

3.10 

3.62 

3.62 

2.83 

2.87 

2.77 

3.38 

3.49 

2.72 

2.81 

2.55 

3.28 

3.38 

2.27 

2.67 

2.91 

1.93 

2.83 

2.53 

2.65 

2.69 

3.53 

3.44 

2.52 

2.55 

2.77 

3.22 

3.24 

2.81 

2.44 

2.62 

3.00 

3.12 

2.61 

2.34 

2.02 

2.52 

2.62 

2.27 

2.26 

1.81 

4.20 

3.83 

2.62 

2.24 

2.36 

4.14 

3.89 

2.23 

2.24 

2.14 

3.75 

3.45 

2.83 

2.22 

2.63 

1.91 

2.48 

2.39 

2.20 

1.87 

3.83 

3.59 

2.77 

2.18 

2.26 

2.73 

2.75 

2.29 

2.16 

2.55 

2.00 

2.73 

2.29 

2.08 

1.95 

2.20 

2.40 

1.95 

1.91 

2.14 

3.87 

3.71 

3.20 

1.89 

1.87 

3.65 

3.44 

2.89 

1.89 

2.35 

1.97 

2.45 

1 . 97 

1.77 

1.71 

3.51 

3.32 

3.02 

1.59 

1.49 

1.81 

1.87 

1.66 

1.55 

1.93 

1.30 

1.83 

1.70 

1.51 

1.69 

2.04 

2.30 

2.18 

1.42 

1.42 

1.26 

1.59 

1.52 

1.40 

1.79 

0.89 

1.49 

1.56 

1.38 

1.83 

1.36 

1.73 

1.68 

1.36 

1.26 

3.14 

2.91 

2.41 

1.36 

1.22 

2.18 

2.00 

1.68 

1.32 

0.89 

2.32 

2.16 

1.87 

1.28 

1.24 

1  57 

1.67 

1.43 

1.24 

1.24 

1.06 

1.36 

1.60 

1.22 

1.18 

0.89 

1.18 

1.63 

1.20 

0.91 

1.83 

1.71 

1.29 

1.18 

0.79 

1.85 

1.70 

1.57 

( table  continues) 
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Table  B.4  (continued) 


Task  Categories 


Sketch  maps /overlays /range  cards 
Detect/Identify  targets 
Place/camoufl  tact  equip/mater 
Use  hand  grenades 
Decode  data 

Plan  placeroent/use  tact  equip 

Plan  operations 

Paint 

Analyze  electronic  signals 
Write  documents /correspondence 
Troubleshoot  mechanic&x  systems 
Compute  statistics /other  math 
Write/deliver  presentations 
Handle  demolitions/mines 
Direct/ lead  teams 
Engage  in  hand-to-hand  combat 
Troubleshoot  weapons 
Reproduce  printed  material 
Use  audiovisual  equipment 
Conduct  land  surveys 
Prep  equip/supplies  for  air  drop 
Repair  metal 

Assemble  steel  structures 
Operate  radar 
Operate  track  vehicle 
Draw  illustrations 
Interview 
Prepare  parachutes 
Analyze  intelligence  data 
Draw  maps /over lays 
Repair  weapons 

Determine  fire  data-indirect  weap 
Repair  plastic/fiberglass 
Produce  technical  drawings 
Control  money 
Analyze  weather  conditions 
Operate  iift/load/grade  equip 
Receive  clients/patients/guests 
Construct  wooden  bldgs/struct 
Operate  boats 
Fire  indirect  fire  weapons 
Fire  heavy  direct  fire  weapons 
Provide  programming/DP  support 
Construct  masonry  bldgs/struct 


Rating 


FRE 

N-49 

CTI 

N-49 

GSI 

N-49 

OJI 

N-49 

DIF 

N-48 

1.14 

0.81 

1.95 

1.79 

1.64 

1.14 

1.00 

2.51 

2.18 

2.14 

1.12 

1.14 

2.44 

2.24 

1.91 

1.04 

0.65 

2.49 

2.08 

1.55 

1.02 

1.16 

1.59 

1.55 

2.00 

1.00 

0.93 

1.74 

1.61 

1.54 

0.95 

1.08 

1.30 

1.42 

1.41 

0.89 

0.71 

0.93 

0.98 

0.87 

0.87 

0.87 

0.53 

0.77 

1.02 

0.79 

0.91 

0.89 

1.10 

1.25 

0.69 

0.85 

0.59 

0.83 

0.89 

0.67 

1.04 

0.69 

1.02 

1.25 

0.63 

0.71 

0.91 

0.95 

1.12 

0.55 

0.53 

1.04 

1.00 

1.10 

0.55 

0.42 

1.02 

1.00 

1.08 

0.55 

0.24 

1.04 

1.02 

1.08 

0.53 

0.59 

0.91 

0.87 

0.64 

0.51 

0.40 

0.44 

0.67 

0.54 

0.44 

0.55 

0.49 

0.55 

0.50 

0.42 

0.24 

0.85 

0.83 

0.81 

0.40 

0.46 

0.51 

0.65 

0.77 

0.38 

0.42 

0.26 

0.44 

0.56 

0.34 

0.55 

0.53 

0.65 

0.77 

0.34 

0.51 

0.38 

0.44 

0.64 

0.30 

0.30 

0.40 

0.46 

0.52 

0.26 

0.32 

0.34 

0.34 

0.27 

0.24 

0.40 

0.36 

0.46 

0.47 

0.24 

0.24 

0.28 

0.28 

0.33 

0.18 

0.26 

0.40 

0.40 

0.33 

0.16 

0.16 

0.32 

0.28 

0.22 

0.16 

0.18 

0.24 

0.26 

0.25 

0.14 

0.24 

0.30 

0.32 

0.31 

0.12 

0.24 

0.08 

0.24 

0.35 

0.12 

0.10 

0.08 

0.14 

0.16 

0.10 

0.10 

0.14 

0.12 

0.12 

0.10 

0.10 

0.12 

0.10 

0.27 

0.10 

0.10 

0.08 

0.12 

0.27 

0.08 

0.12 

0.18 

0.12 

0.22 

0.08 

0.02 

C.02 

o.os 

0.14 

0.06 

0.06 

0.08 

0.08 

0.12 

0.06 

0.08 

0.08 

0.12 

0.12 

0.06 

0.08 

0.06 

0.08 

0.16 

0.06 

0.02 

0.02 

0.08 

0.12 

0.06 

0.02 

0.02 

0.02 

0.06 

(table  continues ) 
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Table  B . 4  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GS1  OJI  DIF 

N=49  N=49  N=49  N=49  N=48 


Install  pipe  assemblies 
Translate  foreign  languages 
Select /lay/clean  med/dent  equip 
Operate  power  excavating  equip 
Control  air  traffic 
Prep  heavy  weap  for  tactical  use 
Load/unload  artillery/ tank  guns 
Perform  medical  lab  procedures 
Provide  medical/dental  treatment 
Cook 


0.04 

0.02 

0.02 

0.02 

0.10 

0.04 

0.08 

0.08 

0.08 

0.08 

0.04 

0.06 

0.06 

0.06 

0.06 

0.04 

0.04 

0.04 

0.04 

0.08 

0.02 

0.02 

0.02 

0.02 

0.06 

0.02 

0.02 

0.02 

0.02 

0.10 

0.02 

0.01 

0.02 

0.02 

0.10 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note.  FRE  -  Frequency 

CTI  ■  Core  Technical  Importance 
GSI  ■  General  Soldiering  Importance 
OJI  ■  Overall  Importance 
DIF  -  Difficulty 

N  ■  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 


Table  B.5 


Army  Task  Questionnaire  Mean  Ratings:  3 1C  -  Single  Channel  Radio 
Operator 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=76  N=76  N=76  N=76  N=76 


Send/receive  radio  messages 
Operate  electronic  equipment 
Type 

Install  electronic  components 

Perform  op  maint  chcks/svcs 

Operate  wheeled  vehicle 

Operate  gas/electric  power  equip 

Read  tech  manl/field  manl/etc 

Decode  data 

Act  as  a  model 

Survive  in  the  field 

Lead 

Communicate 
Use  maps 

Protect  against  NBC  hazards 

Monitor/inspect 

Train 

Fire  individual  weapons 
Counsel 

Perform  op  chcks/svcs  on  weap 
Install  wire/cables 
Give  directions/instructions 
Prep  technical  forms /documents 
Place/camouf 1  tact  equip/mater 
Order  equipment/supplies 
Pack/load  materials 
Navigate 
Give  first  aid 
Give  short  oral  reports 
Analyze  electronic  signals 
Inspect  electrical  systems 
Plan  placement/use  tact  equip 
Personnel  Administration 
Inspect  electronic  systems 
Paint 

Move/react  in  the  field 

Sketch  maps /overlays /range  cards 

Control  individuals/crowds 

Provide  counseling 

Know  customs/laws  of  war 

Use  hand  &  arm  signals 


4.67 

4.63 

3.85 

4.34 

3.00 

4.39 

4.52 

2.77 

4.03 

3.35 

4.30 

4.23 

2.30 

3.63 

2.90 

4.28 

4.44 

2.72 

3.88 

3.38 

4.27 

4.29 

3.88 

4.30 

2.70 

4.21 

3.88 

3.90 

4.05 

2.45 

3.86 

4.02 

2.64 

3.69 

2.92 

3.82 

4.27 

3.89 

4.10 

2.64 

3.68 

4.05 

3.32 

3.72 

3.34 

3.52 

3.13 

3.86 

3.81 

3.03 

3.32 

3.27 

4.18 

4.01 

3.19 

3.30 

3.13 

3.84 

3.80 

3.28 

3.21 

3.06 

3.56 

3.54 

2.54 

3.18 

3.61 

4.17 

3.98 

3.07 

3.07 

3.21 

4.49 

4.10 

3.21 

3.05 

2.96 

3.32 

3.30 

2.72 

2.97 

3.29 

3.50 

3.50 

2.78 

2.96 

2.65 

4.41 

4.00 

2.77 

2.92 

2.77 

3.34 

3.42 

2.85 

2.81 

2.78 

4.21 

3.86 

2.55 

2.78 

3.21 

2.37 

2.94 

2.28 

2.76 

2.94 

3.20 

3.24 

2.32 

2.71 

2.80 

2.03 

2.73 

2.32 

2.47 

2.33 

3.09 

2.92 

2.19 

2.42 

2.46 

2.21 

2.42 

2.00 

2.42 

2.48 

2.46 

2.72 

2.40 

2.36 

2.86 

3.86 

3.61 

3.22 

2.35 

2.59 

4.09 

3.75 

3.13 

2.30 

2.38 

3.09 

2.94 

2.34 

2.27 

2.76 

1.80 

2.44 

3.02 

2.25 

2.89 

1.67 

2.51 

2.69 

2.19 

2.77 

2.76 

2.85 

2.48 

2.13 

2.16 

2.48 

2.46 

1.93 

2.05 

2.73 

1.39 

2.34 

2.55 

1.96 

1.36 

1.64 

1.76 

1.32 

1.85 

1.84 

3.24 

2.97 

2.44 

1.77 

1.94 

2.78 

2.59 

2.56 

1.68 

1.16 

2.05 

1.89 

1.72 

1.65 

1.78 

1.91 

2.02 

1.63 

1.63 

1.60 

2.93 

2.71 

2.14 

1.52 

1.54 

2.04 

1.97 

1.42 

(table  continues) 
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Table  B . 5  ( continued ) 


Task  Categories 


Record/file/dispatch  information 
Conduct  land  surveys 
Plan  operations 
Operate  computer  hardware 
Direct/lead  teams 
Analyze  weather  conditions 
Troubleshoot  mechanical  systems 
Detect/identify  targets 
Repair  mechanical  systems 
Use  hand  grenades 
Operate  track  vehicle 
Repair  electrical  systems 
Assemble  steel  structures 
Troubleshoot  weapons 
Write  documents /correspondence 
Repair  electronic  components 
Compute  statistics/other  math 
Receive/store/issue  supp/equip 
Engage  in  hand-to-hand  combat 
Reproduce  printed  material 
Write/deliver  presentations 
Use  audiovisual  equipment 
Handle  demolitions/mines 
Draw  maps /overlays 
Prep  equip/supplies  for  air  drop 
Analyze  intelligence  data 
Estimate  time/cost  of  maint  ops 
Repair  weapons 
Repair  metal 

Provide  programming /DP  support 

Draw  illustrations 

Receive  clients/patients/guests 

Repair  plastic/fiberglass 

Prepare  parachutes 

Control  money 

Operate  radar 

Determine  fire  data-indirect  weap 
Translate  foreign  languages 
Produce  technical  drawings 
Interview 

Construct  wooden  bldgs /struct 
Cook 

Provide  medical/dental  treatment 
Operate  lift/load/grade  equip 


Rating 


FRE 
N=7  6 

CTI 
N=7  6 

GSI 

N=76 

OJI 

N=76 

DIF 

N=76 

1.50 

1.52 

1.21 

1.46 

1.26 

1.50 

1.78 

2.03 

2.01 

1.67 

1.47 

1.73 

1.98 

2.05 

1.98 

1.42 

1.60 

1.03 

1.53 

1.72 

1.38 

1.68 

2.17 

2.18 

2.18 

1.31 

1.62 

1.48 

1.57 

1.50 

1.28 

1.65 

1.42 

1.66 

2.08 

1.28 

1.38 

2.59 

2.39 

2.40 

1.15 

1.59 

1.30 

1.63 

2.03 

1.14 

1.20 

2 . 78 

2,36 

1.78 

1.14 

1.07 

1.10 

1.18 

1.28 

1.05 

1.61 

0.92 

1.48 

1.97 

1.02 

1.10 

0.75 

1.02 

0.84 

0.97 

1.19 

1.69 

1.61 

1.42 

0.96 

1.15 

0.94 

1.14 

1.25 

0.81 

1.32 

0.85 

1.23 

1.54 

0.81 

1.21 

0.86 

1.14 

1.31 

0.78 

0.88 

0.84 

0.97 

0.80 

0.76 

0.60 

1.53 

1.32 

1.46 

0.72 

0.70 

0.62 

0.73 

0.66 

0.57 

0.65 

0.72 

0.78 

0.84 

0 . 50 

0.54 

0.54 

0 .61 

0.55 

0.42 

0.34 

0.88 

0.81 

0.90 

0.40 

0.46 

0.52 

0.53 

0.46 

0.34 

0.52 

0.42 

0.49 

0.68 

0.32 

0.51 

0.48 

0.53 

0.64 

0.32 

0.40 

0.29 

0.41 

0.40 

0.28 

0.44 

0.61 

0.61 

0.65 

0.28 

0.30 

0.27 

0.30 

0.50 

0.27 

0.28 

0.26 

0.30 

0.39 

0.26 

0.32 

0.32 

0.31 

0.30 

0.25 

0.22 

0.23 

0.23 

0.21 

0.19 

0.15 

0.15 

0.17 

0.23 

0.17 

0.25 

0.22 

0.23 

0.36 

0.15 

0.15 

0.15 

0.15 

0.22 

0.14 

0.10 

0.10 

0.13 

0.25 

0.13 

0.18 

0.23 

0.21 

0.34 

0.11 

0.11 

0.10 

0.07 

0.30 

0.10 

0.13 

0.10 

0.10 

0.11 

0.10 

0.13 

0.19 

0.17 

0.21 

0.10 

0.09 

0.06 

0.09 

0.18 

0.09 

0.06 

0.09 

0.09 

0.14 

0.09 

0.15 

0.25 

0.22 

0.21 

0.07 

0.06 

0.07 

0.10 

0.17 

(table  continues) 


B-14 


Table  B . 5  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=7  6  N=76  N=76  N=76  N=76 


Fire  indirect  fire  weapons 
Control  air  traffic 
Install  pipe  assemblies 
Perform  medical  lab  procedures 
Load/unload  artillery/tank  guns 
Operate  power  excavating  equip 
Construct  masonry  bldgs/struct 
Fire  heavy  direct  fire  weapons 
Select/lay/clean  med/dent  equip 
Operate  boats 

Prep  heavy  weap  for  tactical  use 


0.07 

0.09 

0.09 

0.07 

0.11 

0.06 

0.11 

0.09 

0.10 

0.15 

0.05 

0.03 

0.02 

0.03 

0.11 

0.05 

0.05 

0.06 

0.06 

0.09 

0.03 

0.07 

0.07 

0.06 

0.07 

0.03 

0.01 

0.02 

0.03 

0.09 

0.03 

0.02 

0.02 

0.02 

0.09 

0.02 

0.03 

0.05 

0.05 

0.07 

0.02 

0.03 

0.02 

0.03 

0.02 

0.02 

0.01 

0.02 

0.02 

0.07 

0.01 

0.01 

0.01 

0.01 

0.02 

Note .  FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B.6 


Army  Task  Questionnaire  Mean  Ratings:  31D  -  Mobile  Subscriber 
Equipment  Transmission  System  Operator 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=17  N=17  N=17  N=17  N=17 


Operate  electronic  equipment 
Install  electronic  components 
Perform  op  maint  chcks/svcs 
Read  tech  manl/field  manl/etc 
Install  wire/cables 
Send/receive  radio  messages 
Operate  wheeled  vehicle 
Act  as  a  model 
Lead 

Communicate 

Operate  gas /electric  power  equip 
Counsel 

Place/camouf 1  tact  equip/mater 
Give  directions/instructions 
Use  maps 
Train 

Monitor /inspect 

Inspect  electrical  systems 

Personnel-  Administration 

Inspect  electronic  systems 

Give  short  oral  reports 

Survive  in  the  field 

Decode  data 

Provide  counseling 

Prep  technical  forms /documents 

Plan  placement/use  tact  equip 

Order  equipment/supplies 

Assemble  steel  structures 

Pack/load  materials 

Troubleshoot  mechanical  systems 

Plan  operations 

Protect  against  NBC  hazards 

Give  first  aid 

Fire  individual  weapons 

Navigate 

Use  hand  &  arm  signals 
Perform  op  chcks/svcs  on  weap 
Know  customs /laws  of  war 
Paint 

Conduct  land  surveys 
Control  individuals /crowds 


4 . 70 

4.58 

2.88 

3.82 

3.17 

4.47 

4.29 

2.70 

3.64 

3.23 

4.35 

4.17 

3.76 

4.35 

2.17 

4.23 

4.35 

3.52 

4.05 

2.47 

4.05 

3.93 

2.37 

3.12 

2.00 

4.05 

4.35 

3.35 

3.88 

2.64 

3.70 

3.76 

3.23 

3.58 

1.76 

3.70 

3.47 

4.00 

4.00 

2.88 

3.47 

3.76 

3.70 

3.58 

2.88 

3.41 

3.35 

3.58 

3.58 

2.52 

3.29 

3.05 

2.29 

2.58 

2.00 

3.29 

3.50 

3.68 

3.81 

2.93 

3.25 

3.25 

3.56 

3.50 

1.87 

3.11 

2.93 

2.18 

2.75 

1.87 

3.11 

3.88 

3.82 

4.11 

2.82 

3.11 

3.52 

3.35 

3.47 

2.64 

3.00 

3.11 

3.11 

3.11 

2.29 

2.35 

2.76 

2.05 

2.47 

2.05 

2.35 

2.11 

2.35 

2.64 

1.88 

2.17 

2.41 

1.70 

2.23 

2.05 

2.11 

2.11 

2.11 

2.47 

1.76 

2.11 

2.75 

3.52 

3.23 

2.29 

2.00 

2.23 

2.11 

2.47 

2.05 

1.94 

2. 00 

2.00 

2.35 

1.82 

1.94 

2.00 

1.88 

2.23 

1.35 

1.94 

2.06 

1.31 

1.81 

1.62 

1.94 

1.94 

1.70 

2.05 

1.76 

1.94 

1.94 

0.94 

1.64 

1.35 

1.94 

1.70 

1.41 

1.64 

1.58 

1.88 

2.11 

1.88 

2.17 

2.11 

1.82 

2.05 

1.88 

2.29 

1.82 

1.82 

2.17 

3.52 

3.11 

2.37 

1.82 

1.93 

3.47 

2.88 

2.12 

1.64 

1.23 

3.07 

2.64 

2.07 

1.58 

1.88 

2.64 

2.47 

2.35 

1.52 

1.58 

1.47 

1.64 

1.05 

1.52 

1.41 

2.82 

2.7C 

1.82 

1.35 

1.25 

2.35 

2.11 

1.75 

1.35 

1.11 

0.58 

1.00 

0.88 

1  n  c 

X.  - 

1  O') 

•  W  M 

2.23 

2.05 

1.52 

1.29 

1.06 

2.05 

1.88 

1.52 

(table  continues) 


B-16 


Table  B . 6  ( continued ) 


Task  Categories 


Type 

Analyze  weather  conditions 
Operate  computer  hardware 
Direct/lead  teams 
Repair  mechanical  systems 
Analyze  electronic  signals 
Record/file/dispatch  information 
Repair  electrical  systems 
Write/deliver  presentations 
Use  hand  grenades 
Move/react  in  the  field 
Repair  electronic  components 
Receive/store/issue  supp/equip 
Troubleshoot  weapons 
Reproduce  printed  material 
Engage  in  hand-to-hand  combat 
Repair  weapons 

Write  documents /correspondence 
Analyze  intelligence  data 
Estimate  time/cost  of  maint  ops 
Control  money 
Interview 

Sketch  maps /overlays /range  cards 
Detect/identify  targets 
Draw  illustrations 
Prep  equip/supplies  for  air  drop 
Translate  foreign  languages 
Provide  medical/dental  treatment 
Construct  wooden  bldgs/struct 
Receive  clients/patients/guests 
Handle  demolitions /mines 
Cook 

Perform  medical  lab  procedures 
Repair  plastic/f xberglass 
Compute  statistics/other  math 
Repair  metal 

Construct  masonry  bldgs/struct 

Prepare  parachutes 

Draw  maps /overlays 

Operate  power  excavating  equip 

Fire  heavy  direct  fire  weapons 

Operate  track  vehicle 

Determine  nre  data-indirect  weap 


Rating 


FRE 

N=17 

CTI 

N=17 

GSI 

N=17 

OJI 
N=1 7 

DIF 

N=17 

1.11 

1.00 

0.94 

1.05 

1.00 

1.11 

1.35 

1.29 

1.35 

1.41 

1.05 

1.05 

0.70 

0.94 

1.58 

1.05 

1.11 

1.17 

1.23 

1.00 

1.05 

1.29 

1.41 

1.47 

1.94 

1.00 

1.88 

1.23 

1.23 

1.94 

0.94 

1.00 

0.94 

1.05 

0.64 

0.94 

1.58 

1.17 

1.29 

1.76 

0.88 

0.43 

0.37 

0.43 

0.75 

0.76 

0.62 

2.23 

1.88 

1.25 

0.76 

0.87 

1.88 

1.52 

1.41 

0.76 

1.47 

1.00 

1.17 

1.58 

0.64 

0.64 

0.70 

0.70 

0.70 

0.64 

0.47 

1.00 

1.00 

0.58 

0.58 

0.64 

0.82 

0.82 

0.88 

0.52 

0.43 

1.52 

1.11 

1.47 

0.47 

0.29 

0.64 

0.52 

0.52 

0.47 

0.58 

0.58 

0.58 

0.47 

0.43 

0.75 

0.81 

0.81 

0.81 

0.41 

0.35 

0.47 

0.41 

0.52 

0.35 

0.05 

0.11 

0.29 

0.17 

0.35 

0.35 

0.35 

0.47 

0.52 

0.29 

0.41 

0.82 

0.82 

0.64 

0.29 

0.11 

0.64 

0.47 

0.52 

0.17 

0.47 

0.41 

0.47 

0.47 

0.17 

0.17 

0.23 

0.23 

0.41 

0.11 

0.11 

0.11 

0.17 

0.17 

0.11 

0.11 

0.23 

0.17 

0.11 

0.11 

0.11 

0.17 

0.11 

0.41 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.11 

0.23 

0.11 

0.23 

0.05 

0.11 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.05 

0.11 

0.11 

0.23 

0.05 

0.05 

0.05 

0.05 

0.23 

0.05 

0.05 

0.11 

0.11 

0.23 

0.05 

0.05 

0.05 

0.05 

0.11 

0.05 

0.05 

0.05 

0.11 

0.23 

0.05 

0.05 

0.11 

0.11 

0.17 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0 , 00 

0.00 

0-00 

v* .  00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

(table  continues) 
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Table  B . 6  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=17  N=17  N"17  N=17  N=17 


Provide  programming /DP  support 

Control  air  traffic 

Prep  heavy  weap  for  tactical  use 

Load/unload  artillery/tank  guns 

Fire  indirect  fire  weapons 

Produce  technical  drawings 

Operate  lift/load/grade  equip 

Operate  boats 

Install  pipe  assemblies 

Select/lay/clean  med/dent  equip 

Operate  radar 

Use  audiovisual  equipment 


0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0 . 00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note . 


FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B. 7 


Army  Task  Questionnaire  Mean  Ratings:  51B  -  Carpentry  and 
Masonry  Specialist 


Task  Categories 


Construct  wooden  bldgs /struct 
Construct  masonry  bldgs/struct 
Perform  op  maint  chcks/svcs 
Operate  wheeled  vehicle 
Paint 

Read  tech  manl/field  manl/etc 

Act  as  a  model 

Protect  against  NBC  hazards 

Fire  individual  weapons 

Operate  gas /electric  power  equip 

Perform  op  chcks/svcs  on  weap 

Communicate 

Operate  power  excavating  equip 
Give  directions/instructions 
Use  maps 

Survive  in  the  field 
Lead 

Give  first  aid 

Handle  demolitions/mines 

Counsel 

Place/camouf 1  tact  equip/mater 
Send/receive  radio  messages 
Train 

Assemble  steel  structures 

Move/react  in  the  field 

Pack/load  materials 

Sketch  maps /overlays /range  cards 

Monitor /inspect 

Control  individuals /crowds 

Navigate 

Install  pipe  assemblies 
Use  hand  &  arm  signals 
Give  short  oral  reports 
Compute  statistics/other  math 
Install  wire/cables 
Personnel  Administration 
Know  customs /laws  of  war 
Use  hand  grenades 
Repair  mechanical  systems 
Detect/identify  targets 
Repair  metal 


Rating 


FRE 

N=80 

CTI 

N=80 

GSI 

N=80 

OJI 

N=80 

DIF 

N=79 

4.26 

4.74 

2.10 

4.48 

3.84 

3.92 

4.58 

2.02 

4.25 

4.11 

3.78 

3.47 

3.77 

4.00 

2.98 

3.73 

3.40 

3.52 

3.82 

2.74 

3.20 

3.17 

1.58 

2.86 

2.10 

3.00 

3.58 

3.45 

3.56 

2.82 

2.97 

3.17 

3.78 

3.67 

2.93 

2.95 

2.16 

4.23 

4.02 

3.24 

2.91 

2.34 

4.35 

4.00 

2.83 

2.70 

3.25 

2.07 

3.16 

2.93 

2.68 

2.55 

4.18 

3.85 

3.01 

2.67 

2.67 

3.24 

3.21 

2.48 

2.67 

3.17 

1.63 

2.98 

2.94 

2.62 

2.97 

3.24 

3.25 

2.55 

2.48 

2.47 

3.78 

3.58 

3.16 

2.45 

2.25 

3.93 

3.76 

3.06 

2.42 

2.78 

3.07 

3.07 

2.86 

2.35 

2.51 

4.22 

3.90 

3.40 

2.30 

2.61 

3.11 

3.30 

3.57 

2.27 

2.12 

2.79 

2.77 

2.48 

2.22 

2.15 

3.52 

3.47 

2.86 

1.97 

1.90 

3.22 

3.02 

2.62 

1.96 

2.17 

2.42 

2.50 

2.16 

1.91 

2.84 

1.53 

2.60 

3.29 

1.91 

1.75 

3.53 

3.32 

2.83 

1.87 

2.06 

2.22 

2.45 

2.50 

1.82 

1.67 

2.66 

2.48 

2.55 

1.81 

2.28 

2.37 

2.41 

2.10 

1.73 

1.20 

2.46 

2.42 

2.01 

1.73 

1.86 

3.21 

3.12 

3.00 

1.67 

2.31 

1.08 

2.22 

2.79 

1.65 

1.93 

2.43 

2.28 

1.78 

1.63 

1.91 

2.72 

2.67 

2.24 

1.63 

2.28 

1.36 

2.06 

2.25 

1.51 

1.83 

1.22 

1.91 

1.93 

1.47 

1.52 

1.76 

1.78 

1.70 

1.47 

1.30 

2.74 

2.54 

2.26 

1.46 

1.15 

3.15 

2.79 

2.19 

1.35 

1.60 

1.56 

1.87 

1.98 

1.28 

1.15 

2.62 

2.32 

2.43 

1.25 

1.67 

0.58 

1.47 

2.15 

(table  continues) 
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Table  B .  7  ( continued ) 


Task  Categories 


Order  equipment/ supplies 
Direct/lead  teams 
Receive/store/issue  supp/equip 
Repair  plastic/fiberglass 
Provide  counseling 
Plan  operations 
Troubleshoot  weapons 
Conduct  land  surveys 
Install  electronic  components 
Plan  placement/use  tact  equip 
Prepare  tech  forms /documents 
Engage  in  hand-to-hand  combat 
Draw  illustrations 
Decode  data 

Repair  electrical  systems 
Inspect  electrical  systems 
Troubleshoot  mechanical  systems 
Record/file/dispatch  information 
Produce  technical  drawings 
Operate  electronic  equipment 
Operate  lift/load/grade  equip 
Reproduce  printed  material 
Type 

Estimate  time/cost  of  maint  ops 
Write  documents /correspondence 
Analyze  weather  conditions 
Draw  maps /overlays 
Repair  weapons 
Write/deliver  presentations 
Repair  electronic  components 
Determine  fire  data-indirect  weap 
Control  money 
Use  audiovisual  equipment 
Inspect  electronic  systems 
Operate  computer  hardware 
Operate  track  vehicle 
Interview 

Analyze  intelligence  data 
Prep  equip/supplies  for  air  drop 
Provide  programming/DP  support 
Operate  boats 

Receive  clients/patients/guests 
Load/unload  artillery/tank  guns 
Analyze  electronic  signals 


Rating 


FRE 

N=80 

CTI 

N=8C 

GSI 

N=C0 

OJI 

N=80 

DIF 

N=79 

1.22 

1.73 

1.31 

1.63 

1.87 

1.03 

1.36 

1.82 

1.86 

1.79 

1.00 

1.31 

0.91 

1.32 

1.26 

0.97 

1.18 

0.43 

1.10 

1.38 

0.92 

0.90 

1.33 

1.31 

1.21 

0.92 

1.41 

1.47 

1.57 

1.59 

0.88 

1.02 

1.48 

1.42 

1.39 

0.87 

1.26 

1.28 

1.32 

1.78 

0.87 

0.95 

1.22 

1.26 

1.57 

0.83 

0.86 

1.36 

1.27 

1.35 

0.81 

0.87 

0.80 

0.98 

0.88 

0.80 

0.69 

1.82 

1.63 

1.91 

0.76 

1.23 

0.68 

1.10 

1.22 

0.75 

0.68 

1.32 

1.31 

1.70 

0.71 

1.00 

0.56 

0.92 

1.25 

0.71 

0.98 

0.60 

0.91 

1.32 

0.70 

0.96 

0.95 

1.00 

1.38 

0.67 

0.57 

0.65 

0.72 

0.78 

0.61 

0.93 

0.35 

0.85 

1.17 

0.58 

0.73 

1.03 

0.96 

1.08 

0.53 

0.63 

0.32 

0.61 

0.96 

0.45 

0.30 

0.26 

0.30 

0.31 

0.45 

0.37 

0.37 

0.40 

0.78 

0.39 

0.55 

0.47 

0.56 

0.67 

0.37 

0.60 

0.56 

0.65 

0.72 

0.35 

0.48 

0.50 

0.57 

0.53 

0.33 

0.33 

0.41 

0.45 

0.62 

0.30 

0.23 

0.63 

0.61 

0.58 

0.27 

0.35 

0.35 

0.41 

0.51 

0.27 

0.31 

0.21 

0.36 

0.45 

0.22 

0.22 

0.35 

0.33 

0.46 

0.21 

0.22 

0.18 

0.21 

0.26 

0.18 

0.17 

0.20 

0.25 

0.32 

0.16 

0.21 

0.11 

0.20 

0.30 

0.16 

0.13 

0.13 

0.13 

0.29 

0.12 

0.15 

0.15 

0.15 

0.20 

0.11 

0.12 

0.13 

0.13 

0.16 

0.11 

0.12 

0.22 

0.21 

0.29 

0.10 

0.06 

0.11 

0.07 

0.25 

0.07 

0.10 

0.03 

0.07 

0.13 

0.07 

0.08 

0.08 

0.10 

0.17 

0.06 

0.10 

0.10 

0.11 

0.03 

0.06 

0.03 

0.02 

0.02 

0.08 

0.06 

0.03 

0.12 

0.11 

0.13 

(table  continues) 
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Table  B . 7  ( continued ) 


Task  Categories 


Cook 

Fire  indirect  fire  weapons 
Provide  medical/dental  treatment 
Translate  foreign  languages 
Perform  medical  lab  procedures 
Prep  heavy  weap  for  tactical  use 
Prepare  parachutes 
Control  air  traffic 
Fire  heavy  direct  fire  weapons 
Operate  radar 

Select/lay/clean  med/dent  equip 


Ratin 


FRE 

N=80 

CTI 

N=80 

GSI 

N=80 

OJI 

N=80 

DIF 

N=79 

0.05 

0.05 

0.07 

0.06 

0.06 

0.05 

0.05 

0.05 

0.05 

0.02 

0.03 

0.05 

0.12 

0.11 

0.11 

0.03 

0.07 

0.05 

0.07 

0.07 

0.03 

0.03 

0.05 

0.06 

0.01 

0.02 

0.02 

0.03 

0.02 

0.03 

0.02 

0.00 

0.01 

0.00 

0.03 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note . 


FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B.8 


Army  Task  Questionnaire  Mean  Ratings:  54B  -  Chemical  Operations 
Specialist 


Task  Categories 


Protect  against  NBC  hazards 
Read  tech  manl/field  manl/etc 
Perform  op  maint  chcks/svcs 
Operate  wheeled  vehicle 
Use  maps 

Operate  gas /electric  power  equip 
Survive  in  the  field 
Send/receive  radio  messages 
Fire  individual  weapons 
Act  as  a  model 

Place/camouf 1  tact  equip/mater 
Navigate 

Give  directions/instructions 
Perform  op  chcks/svcs  on  weap 
Give  short  oral  reports 
Give  first  aid 

Sketch  maps /overlays /range  cards 
Train 

Use  hand  &  arm  signals 
Move/react  in  the  field 
Monitor/ inspect 
Communicate 

Know  customs /laws  of  war 
Lead 

Order  equipment/supplies 
Pack/load  materials 
Operate  electronic  equipment 
Analyze  weather  conditions 
Plan  placement/use  tact  equip 
Operate  track  vehicle 
Detect/identify  targets 
Direct/ lead  teams 
Repair  mechanical  systems 
Compute  statistics/other  math 
Prep  technical  forms /documents 
Plan  operations 
Paint 

Record/file/dispatch  information 
Use  hand  grenades 
Troubleshoot  mechanical  systems 
Decode  data 


Rating 


FRE 

N=67 

CTI 

N=67 

GSI 

N=67 

OJI 

N=67 

DIF 

N=66 

4 . 67 

4.86 

4.57 

4.71 

3.26 

4.01 

4.25 

3.97 

4.19 

2.53 

3.88 

4.16 

4.07 

4.28 

2.89 

3.67 

3.68 

3.53 

3.92 

2.40 

3.56 

4 . 20 

4.22 

4.23 

3.09 

3.46 

3.83 

2.42 

3.36 

2.87 

3.25 

3.58 

4.14 

4.06 

3.22 

3.17 

3.70 

3.91 

3.86 

2.81 

3.06 

3.09 

4.12 

3.89 

2.59 

3.01 

2.97 

3.40 

3.42 

2.83 

2.92 

3.26 

3.58 

3.55 

2.64 

2.91 

3.52 

3.86 

3.80 

3.34 

2.85 

3.25 

3.17 

3.26 

2.50 

2.82 

3.00 

4.16 

4.10 

2.47 

2.76 

2.98 

3.10 

3.19 

2.69 

2.76 

3.07 

4.17 

3.88 

3.21 

2.56 

3.04 

3.46 

3.37 

3.04 

2.49 

3.00 

3.01 

3.22 

2.75 

2.49 

2.78 

3.10 

2.97 

2.18 

2.47 

2.38 

3.52 

3.33 

2.67 

2.34 

3.04 

3.00 

3.12 

2.72 

2.29 

2.77 

2.90 

2.97 

2.31 

2.17 

2.09 

3.09 

2.82 

2.31 

2.13 

2.73 

2.95 

2.92 

2.72 

2.11 

2.50 

1.90 

2.38 

2.12 

2.06 

2.04 

2.19 

2.43 

2.32 

2.03 

2.59 

2.58 

2.70 

2.50 

2.00 

2.86 

1.95 

2.41 

2.47 

1.97 

2.44 

2.53 

2.55 

2.28 

1.95 

2.29 

1.86 

2.20 

2.21 

1.94 

2.16 

3.09 

2.83 

2.75 

1.94 

2.68 

2.71 

2.76 

2.83 

1.92 

2.46 

2.23 

2.37 

2.56 

1.88 

2.65 

1.61 

2.17 

2.59 

1.85 

2.11 

1.55 

2.10 

1.95 

1.82 

2.37 

2.36 

2.40 

2.49 

1.80 

1.18 

1.31 

1.47 

1.09 

1.77 

1.92 

1.47 

1.92 

1.81 

1.77 

1.95 

3.37 

2.97 

1.86 

1.76 

1.98 

1.77 

1.87 

2.36 

1.74 

2.32 

2.41 

2.41 

2.68 

(table  continues) 
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Table  B . 8  ( continued ) 


Task  Categories 


Control  individuals/crowds 
Counsel 

Receive/store/issue  supp/equip 
Handle  demolitions/mines 
Install  electronic  components 
Conduct  land  surveys 
Personnel  Administration 
Write/deliver  presentations 
Type 

Install  wire/cables 

Write  documents /correspondence 

Troubleshoct  weapons 

Engage  in  hand-to-hand  combat 

Operate  computer  hardware 

Provide  counseling 

Draw  maps /overlays 

Reproduce  printed  material 

Use  audiovisual  equipment 

Prep  equip/supplies  for  air  drop 

Inspect  electrical  systems 

Interview 

Repair  weapons 

Analyze  intelligence  data 

Estimate  time/cost  of  maint  ops 

Install  pipe  assemblies 

Draw  illustrations 

Repair  electrical  systems 

Inspect  electronic  systems 

Analyze  electronic  signals 

Produce  technical  drawings 

Repair  electronic  components 

Repair  metal 

Repair  plastic/fiberglass 
Prepare  parachutes 
Determine  fire  data-indirect  weap 
Operate  lift/load/grade  equip 
Assemble  steel  structures 
Load/unload  artillery/tank  guns 
Control  money 

Prep  heavy  weap  for  tactical  use 
Provide  programming/DP  support 
Receive  clients/patients/guests 
Fire  heavy  direct  fire  weapons 
Construct  wooden  bldgs /struct 


Rating 


FRE 
N=6  7 

CTI 
N=6  7 

GSI 
N=6  7 

OJI 

N=67 

DIF 

N=66 

1.73 

1.47 

2.25 

2.09 

1.98 

1.68 

1.92 

2.35 

2.38 

2.16 

1 . 61 

1.74 

1.35 

1.67 

1.78 

1.59 

2.14 

2.14 

2.31 

2.90 

1.52 

1.72 

1.86 

2.03 

1.90 

1.38 

1.80 

1.65 

1.74 

1.68 

1.37 

1.47 

1.64 

1.67 

1.60 

1.26 

1.29 

1.14 

1.25 

1.60 

1.23 

1.06 

1.06 

1.28 

1.87 

1.23 

1.37 

1.39 

1.50 

1.29 

1.20 

1.38 

1.25 

1,38 

1.74 

1.16 

1.49 

2.13 

1.22 

1.72 

1.13 

1.16 

2.20 

1.98 

2.19 

0.98 

0.83 

0.88 

1.03 

1.75 

0.88 

1.10 

1.22 

1.26 

1.09 

0.82 

1.03 

0.97 

1.03 

1.01 

0.77 

0.61 

0.61 

0.82 

0.53 

0.59 

0.59 

0.59 

0.64 

0.57 

0.58 

0.62 

0.59 

0.74 

1.45 

0.46 

0.61 

0.48 

0.62 

0.92 

0.43 

0.46 

0.53 

0.52 

0.60 

0.41 

0.46 

0.65 

0.64 

0.87 

0.41 

0.61 

0.58 

0.59 

0.69 

0.37 

0.53 

0.50 

0.52 

0.71 

0.35 

0.44 

0.16 

0.31 

0.45 

0.30 

0.42 

0.36 

0.36 

0.50 

0.29 

0.44 

0.37 

0.40 

0.63 

0.23 

0.31 

0.25 

0.31 

0.39 

0.23 

0.41 

0.38 

0.38 

0.53 

0.22 

0.28 

0.26 

0.25 

0.31 

0.20 

0.34 

0.26 

0.34 

0.57 

0.20 

0.20 

0.19 

0.23 

0.47 

0.16 

0.16 

0.13 

0.15 

0.32 

0.11 

0.13 

0.10 

0.13 

0.28 

0.11 

0.28 

0.23 

0.31 

0.33 

0.10 

0.13 

0.09 

0.09 

0.25 

0.10 

0.10 

0.20 

0.19 

0.39 

0.09 

0.00 

0.06 

0.06 

0.04 

0.09 

0.04 

0.14 

0.10 

0.12 

0.07 

0.00 

0.04 

0.04 

0.06 

0.06 

0.10 

0.07 

0.07 

0.13 

0.04 

0.07 

0.07 

0.07 

0.07 

0.04 

0.04 

0.04 

0.04 

0.04 

0.03 

0.01 

0.03 

0.03 

0.10 

(table  continues) 
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Table  B . 8  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=6  7  N=67  N=67  N=67  N=66 


Operate  boats 

Provide  medical/dental  treatment 
Operate  radar 

Select/lay/clean  med/dent  equip 
Construct  masonry  bldgs/struct 
Translate  foreign  languages 
Control  air  traffic 
Fire  indirect  fire  weapons 
Cook 

Operate  power  excavating  equip 
Perform  medical  lab  procedures 


0.03 

0.01 

0.01 

0.01 

0.12 

0.01 

0.01 

0.03 

0.01 

0.03 

0.01 

0.01 

0.04 

0.04 

0.06 

0.01 

0.01 

0.01 

0.01 

0.07 

0.01 

0.04 

0.03 

0.03 

0.04 

0.01 

0.04 

0.04 

0.06 

0.06 

0.01 

0.00 

0.03 

0.03 

0.06 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note .  FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B.9 


Army  Task  Questionnaire  Mean  Ratings:  55B  -  Ammunition 
Specialist 


Task  Categories 


Operate  lift/load/grade  equip 

Receive/store/issue  supp/equip 

Perform  op  maint  chcks/svcs 

Pack/load  materials 

Read  tech  manl/field  manl/etc 

Operate  wheeled  vehicle 

Act  as  a  model 

Handle  demolitions/mines 

Communicate 

Lead 

Counsel 

Protect  against  NBC  hazards 
Fire  individual  weapons 
Train 

Perform  op  chcks/svcs  on  weap 
Give  directions/instructions 
Give  first  aid 

Prep  technical  forms /documents 
Use  hand  &  arm  signals 
Survive  in  the  field 
Use  maps 
Paint 

Monitor/ inspect 
Use  hand  grenades 
Personnel  Administration 
Record/file/dispatch  information 
Navigate 

Move/react  in  the  field 
Control  individuals/crowds 
Send/receive  radio  messages 
Operate  gas/electric  power  equip 
Detect/identify  targets 
Plan  operations 

Place/camouf 1  tact  equip/mater 

Know  customs /laws  of  war 

Provide  counseling 

Compute  statistics /other  math 

Direct/lead  teams 

Sketch  maps /overlays/range  cards 

Give  short  oral  reports 

Order  equipment/supplies 

Plan  placement/use  tact  equip 


Rating 


FRE 
N=6 1 

CTI 

N=61 

GSI 

N=61 

OJI 
N=6 1 

DIF 

N=61 

4.06 

4.28 

2.68 

3.66 

3.10 

4.04 

4.16 

2.95 

3.75 

2.88 

3.67 

3.48 

3.46 

3.54 

2.48 

3.49 

3.66 

2.73 

3.40 

2.83 

3.37 

3.57 

3.41 

3.55 

2.51 

3.19 

3.24 

3.29 

3.36 

2.34 

3.08 

2.88 

3.21 

3.00 

2.25 

3.01 

3.37 

2.63 

3.14 

2.47 

3.00 

3.00 

3.20 

3.08 

2.26 

2.77 

2.70 

3.04 

2.96 

2.39 

2.72 

2.72 

3.01 

2.89 

2.42 

2.65 

3.41 

4.18 

3.93 

2.93 

2.63 

2.58 

3.93 

3.56 

2.40 

2.61 

2.70 

2.93 

2.79 

2.15 

2.57 

2.65 

3.96 

3.73 

2.21 

2.53 

2.56 

2.70 

2.70 

2.01 

2.34 

3.01 

3.93 

3.65 

2.75 

2.31 

2.54 

1.89 

2.35 

2.08 

2.29 

2.75 

2.36 

2.60 

2.11 

2.23 

2.76 

3.68 

3.35 

2.78 

2.13 

2.49 

3.27 

2.91 

2.68 

2.08 

2.15 

1.55 

1.81 

1.42 

2.01 

2.11 

2.41 

2.23 

1.88 

1.95 

2.54 

2.95 

2.82 

2.03 

1.90 

2.06 

2.16 

2.11 

1.86 

1.81 

2.30 

1.69 

2.03 

1.96 

1.77 

2.24 

3.08 

2.86 

2.50 

1.70 

1.90 

2.70 

2.42 

2.13 

1.68 

1.70 

2.08 

2.08 

1.73 

1.67 

2.23 

2.73 

2.57 

2.16 

1.57 

1.80 

1.88 

1.93 

1.98 

1.57 

1.83 

2.39 

2.19 

2.04 

1.55 

1.83 

1.88 

1.91 

1.88 

1.50 

1.82 

2.52 

2.19 

1.93 

1.49 

1.78 

2.36 

2.16 

1.91 

1.47 

1.44 

1.68 

1.63 

1.36 

1.32 

1.47 

1.18 

1.37 

1.09 

1.31 

1.44 

1.90 

1.70 

1.73 

1.26 

1.50 

2.28 

2.03 

2.21 

1.23 

1.41 

2.03 

1.96 

1.62 

1.21 

1.42 

1.26 

1.39 

1.52 

1.11 

1.21 

1.50 

1.42 

1.54 

(table  continues) 
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Table  B . 9  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=61  N=6 1  N=6 1  N=6 1  N=61 


Prep  equip/supplies  for  air  drop 
Type 

Reproduce  printed  material 
Engage  in  hand-to-hand  combat 
Operate  computer  hardware 
Write  documents /correspondence 
Repair  mechanical  systems 
Troubleshoot  weapons 
Operate  electronic  equipment 
Conduct  land  surveys 
Use  audiovisual  equipment 
Construct  wooden  bldgs /struct 
Decode  data 

Troubleshoot  mechanical  systems 

Install  wire/cables 

Load/unload  artillery/tank  guns 

Repair  weapons 

Write/deliver  presentations 

Install  electronic  components 

Provide  programming /DP  support 

Inspect  electrical  systems 

Determine  fire  data-indirect  weap 

Draw  maps /overlays 

Operate  track  vehicle 

Estimate  time/cost  of  maint  ops 

Control  money 

Fire  indirect  fire  weapons 

Repair  metal 

Draw  illustrations 

Interview 

Operate  power  excavating  equip 
Prepare  parachutes 
Analyze  weather  conditions 
Produce  technical  drawings 
Receive  clients/patients/guests 
Translate  foreign  languages 
Repair  electrical  systems 
Inspect  electronic  systems 
Cook 

Analyze  intelligence  data 
Repair  plastic/fiberglass 
Control  air  traffic 
Repair  electronic  components 
Assemble  steel  structures 


1.04 

1.74 

1.31 

1.65 

2.01 

1.00 

1.21 

0.88 

1.16 

1.57 

0.90 

0.63 

0.72 

0.77 

0.72 

0.88 

0.95 

1.41 

1.18 

1.43 

0.76 

0.74 

0.55 

0.75 

1.01 

0.72 

0.75 

0.82 

0.91 

0.93 

0.68 

0.75 

0.72 

0.83 

1.16 

0.67 

0.95 

1.11 

1.27 

1.13 

0.65 

0.80 

0.96 

0.90 

1.05 

0.63 

0.91 

1.01 

1.00 

1.13 

0.63 

0.60 

0.60 

0.63 

0.72 

0.59 

0.54 

0.41 

0.45 

0.73 

0.57 

0.68 

0.96 

0.98 

1.00 

0.55 

0.68 

0.78 

0.68 

0.93 

0.44 

0.55 

0.51 

0.54 

0.48 

0.42 

0.51 

0.38 

0.45 

0.43 

0.42 

0.55 

0.63 

0.70 

0.65 

0.41 

0.60 

0.62 

0.65 

0.80 

0.39 

0.42 

0.50 

0.54 

0.67 

0.39 

0.45 

0.36 

0.52 

0.83 

0.34 

0.32 

0.39 

0.39 

0.49 

0.31 

0.47 

0.47 

0.47 

0.62 

0.31 

0.41 

0.43 

0.38 

0.68 

0.31 

0.34 

0.26 

0.32 

0.54 

0.29 

0.34 

0.27 

0.31 

0.37 

0.27 

0.37 

0.34 

0.36 

0.44 

0.27 

0.34 

0.34 

0.32 

0.39 

0.27 

0.24 

0.24 

0.24 

0.32 

0.26 

0.34 

0.36 

0.34 

0.41 

0.26 

0.38 

0.38 

0.41 

0.42 

0.24 

0.27 

0.29 

0.29 

0.31 

0.24 

0.29 

0.23 

0.31 

0.44 

0.24 

0.32 

0.29 

0.29 

0.29 

0.24 

0.26 

0.29 

0.26 

0.36 

0.21 

0.23 

0.23 

0.31 

0.19 

0.19 

0.23 

0.16 

0.24 

0.50 

0.18 

0.32 

0.27 

0.39 

0.49 

0.16 

0.23 

0.27 

0.23 

0.39 

0.16 

0.19 

0.18 

0.21 

0.18 

0.16 

0.19 

0.31 

0.24 

0.32 

0.16 

0.13 

0.06 

0.08 

0.16 

0.14 

0.27 

0.21 

0.23 

0.29 

0.14 

0.21 

0.23 

0.27 

0.44 

0.14 

0.16 

0.13 

0.18 

0.21 

(table  continues) 
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Table  B . 9  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=61  N=61  N=61  N=61  N=61 


Fire  heavy  direct  fire  weapons 
Operate  radar 
Install  pipe  assemblies 
Provide  medical /dental  treatment 
Perform  medical  lab  procedures 
Prep  heavy  weap  for  tactical  use 
Analyze  electronic  signals 
Operate  boats 

Construct  masonry  bldgs /struct 
Select/lay/clean  med/dent  equip 


0.14 

0.18 

0.16 

0.18 

0.32 

0.13 

0.18 

0.19 

0.21 

0.23 

0.11 

0.16 

0.09 

0.13 

0.24 

0.11 

0.16 

0.16 

0.16 

0.14 

0.11 

0.18 

0.06 

0.14 

0.16 

0.11 

0.11 

0.09 

0.09 

0.09 

0.09 

0.11 

0.09 

0.13 

0.11 

0.06 

0.09 

0.09 

0.08 

0.23 

0.04 

0.09 

0.04 

0.04 

0.08 

0.03 

0.06 

0.04 

0.03 

0.03 

Note .  FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B. 10 


Army  Task  Questionnaire  Mean  Ratings:  95B  -  Military  Police 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=75  N=75  N=75  N=75  N=74 


Send/receive  radio  messages 

Operate  wheeled  vehicle 

Control  individuals/crowds 

Give  directions/instructions 

Perform  op  chcks/svcs  on  weap 

Fire  individual  weapons 

Interview 

Use  maps 

Act  as  a  model 

Perform  op  maint  chcks/svcs 

Navigate 

Customs /laws  of  war 
Give  short  oral  reports 
Survive  in  the  field 
Use  hand  &  arm  signals 
Read  tech  manl/field  manl/etc 
Protect  against  NBC  hazards 
Give  first  aid 

Prep  technical  forms /documents 
Sketch  maps /overlays /range  cards 
Communicate 

Move/react  in  the  field 

Lead 

Counsel 

Detect/identify  targets 

Monitor/ inspect 

Engage  in  hand-to-hand  combat 

Provide  counseling 

Train 

Write  documents/correspondence 
Decode  data 

Place/camouf 1  tact  equip/mater 
Direct/lead  teams 
Plan  placement/use  tact  equip 
Record/file/dispatch  information 
Operate  electronic  equipment 
Type 

Use  hand  grenades 
Personnel  Administration 
Plan  operations 
Install  electronic  components 
Pack/load  materials 


4.44 

4.46 

4.05 

4.30 

3.00 

4.38 

4.38 

3.62 

4.09 

2.64 

4.12 

4.37 

3.00 

3.78 

3.35 

4.09 

4.08 

3.04 

3.69 

2.55 

3.98 

4.20 

4.45 

4.43 

2.71 

3.97 

4.54 

4.50 

4.54 

3.24 

3.97 

4.37 

2.55 

3.68 

3.45 

3.94 

4.38 

4.26 

4.36 

3.39 

3.92 

4.26 

3.68 

3.97 

3.35 

3.80 

3.74 

4.01 

4.01 

2.41 

3.73 

4.37 

3.98 

4.12 

3.54 

3.70 

4.02 

3.32 

3.73 

3.00 

3.66 

3.92 

3.69 

3.77 

2.98 

3.64 

4.25 

4.29 

4.21 

3.44 

3.48 

3.83 

3.33 

3.70 

2.56 

3.45 

3.45 

3.23 

3.39 

2.39 

3.42 

3.90 

4.31 

4.20 

3.35 

3.42 

4.26 

4.10 

4.13 

3.45 

3.32 

3.45 

2.02 

2.86 

2.91 

3.30 

3.70 

3.53 

3.61 

3.29 

3.29 

3.65 

3.25 

3.48 

2.89 

3.17 

3.66 

4.02 

3.94 

3.34 

3.13 

3.62 

3.36 

3.45 

3.16 

3.00 

3.10 

2.85 

3.00 

2.87 

2.80 

3.37 

3.50 

3.45 

3.43 

2.76 

3.08 

2.78 

2.88 

2.71 

2.76 

3.69 

3.16 

3.48 

3.47 

2.74 

3.09 

2.01 

2.72 

2.75 

2.74 

3.09 

2.89 

3.01 

2.77 

2.62 

2.88 

1.73 

2.57 

2.77 

2.58 

3.28 

3.20 

3.17 

3.41 

2.41 

2.90 

3.13 

3.01 

2.58 

2.40 

3.06 

2.88 

2.94 

3.13 

2.33 

2.80 

2.88 

2.89 

2.78 

2.29 

2.34 

1.61 

2.20 

2.13 

2.25 

2.60 

2.28 

2.56 

2.16 

2.25 

2.20 

1.50 

2.01 

2.86 

2.17 

2.68 

3.26 

2.97 

2.08 

2.13 

2.29 

2.13 

2.24 

2.09 

2.04 

2.54 

2.37 

2.44 

2.86 

1.93 

2.32 

2.30 

2.36 

2.25 

1.92 

2.01 

2.06 

2.10 

2.34 

(table  continues) 
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Table  B . 1 0  ( continued ) 


Task  Categories 


Conduct  land  surveys 
Paint 

Operate  computer  hardware 
Handle  demolitions /mines 
Operate  gas /electric  power  equip 
Analyze  intelligence  data 
Write/deliver  presentations 
Use  audiovisual  equipment 
Install  wire/cables 
Reproduce  printed  material 
Order  equipment /supplies 
Troubleshoot  weapons 
Operate  radar 

Receive  clients/patients/guests 
Repair  mechanical  systems 
Analyze  weather  conditions 
Receive/store/issue  supp/equip 
Compute  statistics /other  math 
Draw  maps /overlays 
Determine  fire  data-indirect  weap 
Analyze  electronic  signals 
Translate  foreign  languages 
Draw  illustrations 
Repair  weapons 

Troubleshoot  mechanical  systems 
Control  money 
Operate  boats 
Operate  track  vehicle 
Inspect  electrical  systems 
Prep  equip/supplies  for  air  drop 
Inspect  electronic  systems 
Estimate  time/cost  of  maint  ops 
Repair  electrical  systems 
Assemble  steel  structures 
Control  air  traffic 
Repair  metal 

Fire  indirect  fire  weapons 
Produce  technical  drawings 
Repair  electronic  components 
Provide  programming /DP  support 
Provide  medical /dental  treatment 
Operate  power  excavating  equip 
Construct  wooden  bldgs/struct 
Prepare  parachutes 


Rating 


FRE 

N=75 

CTI 

N=75 

GSI 

N=75 

OJI 

N=75 

DIF 

N=74 

1.48 

1.62 

1.65 

1.76 

1.71 

1.37 

0.77 

1.10 

1.17 

0.91 

1.36 

1.48 

1.04 

1.45 

2.52 

1.33 

1.64 

1.90 

1.86 

2.31 

1.18 

1.42 

1.41 

1.45 

2.06 

1.12 

1.45 

1.23 

1.36 

1.76 

1.10 

1.22 

0.89 

1.18 

1.64 

1.02 

1.04 

0.78 

1.00 

1.45 

0.97 

1.21 

1.29 

1.29 

1.12 

0.97 

0.74 

0.60 

0.72 

0.77 

0.90 

0.98 

0.88 

1.01 

1.08 

0.89 

1.40 

1.48 

1.44 

1.56 

0.86 

0.94 

0.36 

0.78 

1.02 

0.81 

0.85 

0.56 

0.84 

0.64 

0 . 80 

0.86 

1.04 

1.05 

1.47 

0.78 

0.88 

0.76 

0.81 

1.04 

0.78 

0.85 

0.84 

0.84 

1.05 

0.78 

0.96 

0.73 

0.94 

1.58 

0.74 

0.93 

0.81 

0.93 

1.02 

0.65 

0.93 

0.94 

0.96 

1.25 

0.49 

0.80 

0.77 

0.81 

0.95 

0.48 

0.61 

0.40 

0.56 

1.32 

0.48 

0.58 

0.36 

0.54 

0.75 

0.44 

0.70 

0.73 

0.77 

0.97 

0.34 

0.48 

0.58 

0.58 

0.71 

0.25 

0.21 

0.26 

0.33 

0.35 

0.24 

0.36 

0.24 

0.37 

0.70 

0.24 

0.34 

0.30 

0.37 

0.60 

0.24 

0.22 

0.24 

0.28 

0.43 

0.22 

0.38 

0.32 

0.35 

0.66 

0.21 

0.20 

0.18 

0.18 

0.39 

0.18 

0.24 

0.20 

0.25 

0.36 

0.17 

0.21 

0.21 

0.24 

0.39 

0.16 

0.20 

0.26 

0.24 

0.28 

0.14 

0.18 

0.25 

0.22 

0.40 

0.14 

0.14 

0.22 

0.18 

0.28 

0.14 

0.14 

0.17 

0.18 

0.24 

0.13 

0.20 

0.10 

0.17 

0.27 

0.13 

0.14 

0.14 

0.17 

0.32 

0.12 

0.18 

0.09 

0.16 

0.36 

0.10 

0.22 

0.22 

0.22 

0.20 

0.09 

0.05 

0.06 

0.06 

0.13 

0.08 

0.12 

0.14 

0.14 

0.23 

0.08 

0.08 

0.05 

0.06 

0.17 

(table  continues) 
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Table  B . 1 0  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=75  N=75  N=75  N=75  N=74 


Repair  plastic/fiberglass 
Fire  heavy  direct  fire  weapons 
Load/unload  artillery/ tank  guns 
Cook 

Operate  lift/load/grade  equip 
Construct  masonry  bldgs/struct 
Install  pipe  assemblies 
Prep  heavy  weap  for  tactical  use 
Perform  medical  lab  procedures 
Select/lay/clean  med/dent  equip 


0.06 

0.02 

0.06 

0.05 

0.21 

0.05 

0.09 

0.09 

0.12 

0.18 

0.02 

0.02 

0.05 

0.05 

0.04 

0.02 

0.05 

0.08 

0.08 

0.06 

0.02 

0.02 

0.05 

0.04 

0.08 

0.02 

0.02 

0.02 

0.02 

0.09 

0.02 

0.02 

0.02 

0.04 

0.08 

0.01 

0.02 

0.02 

0.02 

0.02 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note .  FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Table  B. 11 


Army  Task  Questionnaire  Mean  Ratings:  96B  -  Intelligence  Analyst 


Task  Categories 


Use  maps 

Analyze  intelligence  data 
Type 

Give  short  oral  reports 

Read  tech  manl/field  manl/etc 

Record/file/dispatch  information 

Send/receive  radio  messages 

Communicate 

Act  as  a  model 

Sketch  maps /overlays /range  cards 
Operate  computer  hardware 
Prep  technical  forms /documents 
Write/deliver  presentations 
Operate  wheeled  vehicle 
Write  documents /correspondence 
Give  directions /instructions 
Lead 

Analyze  weather  conditions 
Perform  op  maint  chcks/svcs 
Operate  electronic  equipment 
Counsel 

Reproduce  printed  material 

Decode  data 

Train 

Personnel  Administration 

Monitor /inspect 

Protect  against  NBC  hazards 

Fire  individual  weapons 

Draw  maps /overlays 

Perform  op  chcks/svcs  on  weap 

Survive  in  the  field 

Use  audiovisual  equipment 

Navigate 

Give  first  aid 

Plan  placement/use  tact  equip 
Know  customs /laws  of  war 
Detect/identify  targets 
Install  electronic  components 
Place/camouf 1  tact  equip/mater 
Operate  track  vehicle 
Pack/load  materials 
Move/react  in  the  field 


Rating 


FRE 

N=60 

CTI 

N=60 

GSI 

N=60 

OJI 

N=60 

DIF 

N=59 

4.30 

4.66 

4.06 

4.43 

3.01 

4.10 

4.85 

2.75 

4.31 

4.16 

3.56 

3.48 

2.03 

3.18 

2.70 

3.53 

4.06 

3.51 

3.95 

2.91 

3.38 

3.90 

3.50 

3.70 

2.74 

3.30 

3.41 

2.18 

3.11 

2.59 

3.26 

3.76 

3.88 

3.93 

2.61 

3.25 

3.55 

3.71 

3.80 

2.81 

3.23 

3.13 

4.01 

3.91 

3.30 

3.18 

3.81 

3.18 

3.63 

3.03 

3.13 

3.30 

2.00 

3.06 

3.24 

3.00 

3.16 

2.31 

2.98 

2.69 

2.91 

4.00 

2.43 

3.55 

3.59 

2.91 

1.93 

3.61 

3.46 

2.34 

2.88 

3.67 

2.35 

3.32 

3.31 

2.85 

3.30 

3.38 

3.41 

2.59 

2.70 

3.06 

3.83 

3.58 

3.30 

2.68 

3.40 

2.38 

3.06 

2.96 

2.65 

1.60 

3.55 

3.31 

2.37 

2.63 

2.81 

2.70 

3.06 

2.89 

2.56 

2.53 

3.36 

3.26 

2.96 

2.51 

2.15 

1.47 

2.08 

1.27 

2.40 

2.83 

2.98 

3.11 

3.03 

2.33 

2.93 

3.35 

3.33 

2.83 

2.28 

2.25 

2.73 

2.76 

2.37 

2.21 

2.28 

2.93 

2.80 

2.49 

2.16 

1.81 

4.13 

3.73 

2.71 

2.11 

1.60 

4.45 

3.73 

2.62 

2.03 

2.41 

1.58 

2.00 

2.27 

1.98 

1.67 

4.11 

3.72 

2.38 

1.91 

2.01 

3.88 

3.44 

2.70 

1.78 

2.20 

1.40 

1.96 

2.00 

1.78 

1.96 

3.53 

3.08 

2.93 

1.71 

1.30 

4.00 

3.47 

3.15 

1.71 

1.98 

2.31 

2.11 

2.27 

1.66 

1.56 

2.83 

2.51 

2.39 

1.58 

2.30 

3.01 

3.03 

2.71 

1.56 

1.81 

2.41 

2.51 

2.23 

1.53 

1.03 

2.48 

2.20 

1.76 

1.53 

1.23 

2.10 

1.98 

1.96 

1.48 

1.03 

1.96 

1.83 

2.00 

1.41 

1.20 

3.20 

2.65 

2.50 

(table  continues) 
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Table  B . 1 1  ( continued ) 


Task  Categories 


Order  equipment /supplies 
Plan  operations 

Operate  gas /electric  power  equip 
Control  individuals /crowds 
Interview 

Compute  statistics /other  math 

Receive/store/issue  supp/equip 

Provide  counseling 

Conduct  land  surveys 

Use  hand  grenades 

Paint 

Use  hand  &  arm  signals 
Draw  illustrations 
Install  wire/cables 
Direct/lead  teams 
Provide  programming/DP  support 
Engage  in  hand-to-hand  combat 
Handle  demolitions /mines 
Produce  technical  drawings 
Determine  fire  data-indirect  weap 
Troubleshoot  weapons 
Inspect  electrical  systems 
Analyze  electronic  signals 
Translate  foreign  languages 
Repair  mechanical  systems 
Assemble  steel  structures 
Prep  equip/supplies  for  air  drop 
Inspect  electronic  systems 
Troubleshoot  mechanical  systems 
Receive  clients/patients/guests 
Construct  wooden  bldgs /struct 
Repair  weapons 
Prepare  parachutes 
Operate  radar 

Provide  medical/dental  treatment 
Control  money 

Construct  masonry  bldgs /struct 
Estimate  time/cost  of  maint  ops 
Repair  electronic  components 
Operate  lift/load/grade  equip 
Repair  metal 

Fire  indirect  fire  weapons 

Operate  boats 

Cook 


Rating 


FRE 
N=6  0 

CTI 

N=60 

GSI 

N=60 

OJI 

N=60 

DIF 

N=59 

1.31 

1.32 

1.83 

1.89 

1.65 

1.28 

1.65 

1.86 

1.88 

2.22 

1.26 

0.81 

1.65 

1.63 

2.00 

1.21 

0.93 

2.36 

2.11 

1.79 

1.21 

1.66 

1.23 

1.50 

1.86 

1.20 

1.61 

1.03 

1.43 

1.94 

1.18 

1.00 

1.26 

1.25 

1.20 

1.15 

1.21 

1.70 

1.65 

1.84 

0.98 

1.28 

1.25 

1.33 

1.33 

0.91 

0.68 

2.51 

2.03 

1.42 

0.91 

0.13 

0.73 

0.65 

0.67 

0.86 

0.75 

1.76 

1.48 

1.11 

0.86 

1.10 

0.68 

1.00 

1.32 

0.76 

0.78 

1.18 

1.13 

0.93 

0.70 

0.83 

1.60 

1.40 

1.40 

0.56 

0.71 

0.41 

0.70 

1.10 

0.55 

0.40 

1.55 

1.28 

1.37 

0.48 

0.33 

1.15 

0.95 

1.10 

0.46 

0.56 

0.35 

0.51 

0.93 

0.36 

0.48 

0.60 

0.51 

0.71 

0.33 

0.30 

0.80 

0.76 

0.59 

0.28 

0.16 

0.35 

0.31 

0.50 

0.27 

0.47 

0.30 

0.40 

0.74 

0.26 

0.38 

0.21 

0.31 

0.83 

0.25 

0.21 

0.56 

0.53 

0.61 

0.23 

0.25 

0.28 

0.30 

0.32 

0.23 

0.11 

0.35 

0.26 

0.64 

0.20 

C.21 

0.25 

0.26 

0.30 

0.18 

0.08 

0.30 

0.23 

0.39 

0.16 

0.25 

0.18 

0.21 

0.30 

0.13 

0.05 

0.15 

0.15 

0.20 

0.10 

0.11 

0.23 

0.20 

0.25 

0.10 

0.08 

0.16 

0.13 

0.22 

0.08 

0.13 

0.10 

0.16 

0.25 

0.06 

0.05 

0.06 

0.05 

0.11 

0.06 

0.05 

0.08 

0.08 

0.10 

0.05 

0.01 

0.08 

0.08 

0.15 

0.05 

0.05 

0.13 

0.11 

0.13 

0.03 

0.06 

0.08 

0.08 

0.10 

0.03 

0.00 

0.06 

0.05 

0.10 

0.03 

0.01 

0.03 

0.03 

0.01 

0.01 

0.00 

0.03 

0.03 

0.03 

0.01 

0.00 

0.01 

0.01 

0.01 

0.01 

0.03 

0.03 

0.05 

0.06 

(table  continues) 
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Table  B . 1 1  ( continued ) 


_ Rating _ 

Task  Categories  FRE  CTI  GSI  OJI  DIF 

N=60  N=60  N=60  N  =60  N=59 


Install  pipe  assemblies 
Load/unload  artillery/tank  guns 
Prep  heavy  weap  for  tactical  use 
Fire  heavy  direct  fire  weapons 
Control  air  traffic 
Select /lay/clean  med/dent  equip 
Repair  electrical  systems 
Operate  power  excavating  equip 
Perform  medical  lab  procedures 
Repair  plastic/fiberglass 


0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

Note .  FRE  =  Frequency 

CTI  =  Core  Technical  Importance 
GSI  =  General  Soldiering  Importance 
OJI  =  Overall  Importance 
DIF  =  Difficulty 

N  =  the  number  of  participants  on  which  the  task  means 
for  that  rating  were  based 
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Correlations  of  Mean  Frequency,  Importance,  and 
Difficulty  Ratings  Among  Phase  III  MOS 


FR 

CT 

GS 

0J 

OF 

FR 

CT 

GS 

0J 

OF 

FR 

CT 

GS 

03 

OF 

FR 

CT 

GS 

03 

OF 

12B 

128 

12B 

12B 

128 

13B 

13B 

138 

13B 

13B 

27E 

27E 

27E 

27E 

27E 

29E 

29E 

29E 

29E 

29E 

FR12B 

1.00 

CT12B 

.99 

1.00 

GS12B 

.97 

.96 

1.00 

0012B 

.99 

.99 

.99 

1.00 

DF12B 

.92 

.95 

.90 

.94 

1.00 

FR138 

.69 

.65 

.72 

.68 

.54 

1.00 

CT13B 

.69 

.65 

.73 

.69 

.55 

.98 

1.00 

GS138 

.83 

.79 

.89 

.84 

.70 

.93 

.95 

1.00 

0J13S 

.75 

.71 

.80 

.76 

.62 

.98 

.99 

.98 

1.00 

DF13B 

.70 

.67 

.76 

.72 

.62 

.92 

.96 

.95 

.97 

1.00 

FR27E 

.45 

.42 

.52 

.46 

.38 

.48 

.51 

.58 

.54 

.59 

1.00 

CT27E 

.39 

.36 

.45 

.40 

.32 

.43 

.45 

.51 

.48 

.54 

.99 

1.00 

GS27E 

.80 

.77 

.87 

.82 

.70 

.72 

.73 

.87 

.79 

.79 

.81 

.76 

1.00 

OJ27E 

.64 

.61 

.72 

.66 

.55 

.61 

.63 

.75 

.69 

.72 

.95 

.92 

.94 

1.00 

0F27E 

.55 

53 

.63 

.57 

.51 

.53 

.56 

.67 

.61 

.68 

.95 

.93 

.88 

.97 

1.00 

FR29E 

.41 

.38 

.46 

.41 

.31 

.38 

.37 

.46 

.41 

.44 

.88 

.88 

.71 

.84 

.84 

1.00 

CT29E 

.34 

.31 

.39 

.34 

.25 

.32 

.32 

.39 

.35 

.39 

.87 

.88 

.65 

.80 

.81 

.99 

1.00 

GS29E 

.80 

.77 

.87 

.81 

.67 

.67 

.67 

.83 

.75 

.72 

.75 

.70 

.95 

.89 

.82 

.77 

.72 

1.00 

0J29E 

.63 

.60 

.69 

.64 

.52 

.54 

.54 

.67 

.61 

.61 

.86 

.84 

.87 

.92 

.89 

.94 

.91 

.94 

1.00 

0F29E 

.55 

.52 

.61 

.56 

.48 

.45 

.46 

.58 

.52 

.56 

.85 

.83 

.80 

.89 

.89 

.94 

.92 

.87 

.97 

1.00 

FR31C 

.66 

.62 

.70 

.66 

.57 

.55 

.54 

.65 

.60 

.60 

.69 

.68 

.78 

.77 

.72 

.76 

.72 

.83 

.83 

.78 

C731C 

.62 

.59 

.67 

.62 

.55 

.52 

.52 

.62 

.57 

.59 

.71 

.71 

.77 

.78 

.75 

.79 

.76 

.82 

.84 

.82 

GS31C 

.84 

.81 

.90 

.85 

.74 

.70 

.70 

.85 

.77 

.76 

.68 

.63 

.93 

.84 

.76 

.66 

.60 

.95 

.84 

.78 

0J31C 

.76 

.73 

.82 

.77 

.67 

.63 

.63 

.77 

.70 

.70 

.72 

.69 

.89 

.84 

.79 

.75 

.70 

.92 

.88 

.83 

0F31C 

.68 

.66 

.75 

.70 

.64 

.54 

.56 

.70 

.62 

.67 

.73 

.70 

.83 

.83 

.82 

.76 

.72 

.87 

.86 

.86 

FR31D 

.59 

.56 

.62 

.58 

.50 

.52 

.50 

.58 

.54 

.53 

.71 

.72 

.73 

.74 

.68 

.78 

.76 

.76 

.79 

.74 

CT31D 

.57 

.55 

.61 

.57 

.49 

.50 

.48 

.57 

.52 

.52 

.72 

.73 

.73 

.74 

.70 

.80 

.78 

.77 

.81 

.77 

GS310 

.74 

.71 

.80 

.75 

.64 

.63 

.63 

.77 

.69 

.69 

.73 

.69 

.89 

.84 

.76 

.73 

.68 

.91 

.85 

.79 

0J310 

.69 

.66 

.75 

.70 

.60 

.60 

.59 

.71 

.65 

.64 

.75 

.73 

.85 

.82 

.76 

.78 

.73 

.88 

.86 

.80 

0F310 

.60 

.58 

.67 

.62 

.56 

.49 

.49 

.62 

.55 

.59 

.75 

.74 

.79 

.81 

.78 

.80 

.77 

.83 

.86 

.84 

FR51B 

.79 

.78 

.72 

.76 

.74 

.49 

.45 

.56 

.50 

.44 

.42 

.38 

.63 

.53 

.43 

.38 

.33 

.63 

.52 

.43 

CT5IB 

.71 

.73 

.63 

.69 

.71 

.39 

.36 

.45 

.39 

.35 

.38 

.36 

.53 

.46 

.38 

.35 

.32 

.54 

.46 

.38 

GS51B 

.94 

.92 

.95 

.94 

.85 

.68 

.68 

.83 

.75 

.70 

.52 

.45 

.86 

.71 

.61 

.46 

.39 

.86 

.69 

.60 

0J51B 

.87 

.87 

.83 

.86 

.83 

.55 

.53 

.67 

.59 

.54 

.44 

.39 

.72 

.59 

.50 

.39 

.34 

.73 

.59 

.50 

0F51B 

.81 

.84 

.77 

.82 

.84 

.45 

.44 

.58 

.50 

.48 

.39 

.35 

.63 

.53 

.47 

.35 

.31 

.65 

.53 

.48 

FR54B 

.85 

.82 

.88 

.85 

.75 

.70 

.70 

.82 

.75 

.73 

.60 

.56 

.87 

.76 

.69 

.55 

.49 

.86 

.73 

.66 

CT548 

.84 

.82 

.87 

.85 

.76 

.67 

.68 

.81 

.73 

.73 

.60 

.57 

.85 

.76 

.69 

.54 

.50 

.84 

.72 

.66 

G5548 

.90 

.88 

.95 

.91 

.80 

.73 

.74 

.89 

.81 

.79 

.59 

.53 

.91 

.78 

.71 

.52 

.45 

.89 

.74 

.66 

0J548 

.89 

.86 

.93 

.90 

.79 

.71 

.73 

.87 

.79 

.78 

.61 

.56 

.90 

.78 

.71 

.54 

.48 

.89 

.75 

.68 

OF 548 

.82 

.81 

.86 

.84 

.81 

.61 

.64 

.78 

.70 

.75 

.57 

.53 

.82 

.73 

.72 

.50 

.45 

.79 

.69 

.67 

FR55B 

.72 

.67 

.71 

.70 

.63 

.57 

.55 

.65 

.60 

.57 

.48 

.44 

.71 

.60 

.51 

.44 

.39 

.70 

.59 

.52 

CT55B 

.73 

.69 

.74 

.72 

.65 

.58 

.56 

.67 

.62 

.59 

.47 

.42 

.72 

.60 

.53 

.43 

.38 

.71 

.60 

.53 

GS55B 

.86 

.82 

.89 

.86 

.75 

.69 

.6B 

.82 

.75 

.71 

.54 

.47 

.B7 

.72 

.62 

.47 

.41 

.86 

.70 

.61 

OJ55B 

.81 

.77 

.83 

.80 

.71 

.64 

.64 

.77 

.70 

.67 

.51 

.45 

.81 

.68 

.59 

.46 

.40 

.81 

.66 

.59 

DF55B 

.78 

.75 

.81 

.79 

.74 

-  .61 

.61 

.74 

.67 

.67 

.50 

.44 

.79 

.66 

.61 

.44 

.38 

.77 

.64 

.59 

FR95B 

.79 

.75 

.85 

.80 

.69 

.63 

.62 

.78 

.69 

.67 

.50 

.43 

.82 

.67 

.59 

.48 

.40 

.83 

.68 

.60 

CT958 

.80 

.77 

.87 

.82 

.72 

.62 

.62 

.79 

.70 

.69 

.48 

.41 

.82 

.67 

.60 

.46 

.38 

.83 

.67 

.60 

GS958 

.87 

.84 

.93 

.89 

.77 

.68 

.69 

.86 

.77 

.75 

.51 

.44 

.88 

.72 

.63 

.47 

.39 

.88 

.70 

.63 

0J95B 

.83 

.80 

.90 

.85 

.74 

.65 

.65 

.82 

.73 

.72 

.50 

.43 

.85 

.70 

.62 

.47 

.39 

.85 

.69 

.62 

DF95B 

.75 

.73 

.81 

.77 

.73 

.54 

.55 

.72 

.63 

.66 

.44 

.38 

.76 

.63 

.59 

.41 

.34 

.76 

.61 

.59 

FR968 

.54 

.50 

.58 

.53 

.48 

.39 

.37 

.49 

.43 

.43 

.39 

.37 

.60 

.50 

.46 

.43 

.38 

.61 

.52 

.48 

CT96B 

.44 

.42 

.48 

.45 

.42 

.29 

.28 

.39 

.33 

.36 

.33 

.31 

.50 

.41 

.39 

.36 

.33 

.51 

.43 

.42 

GS96B 

.81 

.78 

.88 

.83 

.73 

.63 

.63 

.79 

.71 

.69 

.52 

.47 

.85 

.71 

.63 

.52 

.45 

.87 

.72 

.65 

0J968 

.69 

.66 

.75 

.70 

.63 

.52 

.51 

.66 

.58 

.58 

.46 

.42 

.74 

.62 

.56 

.47 

.42 

.76 

.63 

.58 

DF968 

.60 

.57 

.66 

.62 

.59 

.41 

.40 

.55 

.47 

.50 

.38 

.35 

.64 

.53 

.50 

.40 

.36 

.66 

.55 

.53 

(table  continues) 
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FR31C 
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CT31C 

.99 

1.00 

GS31C 

.S9 

.88 

1.00 

0J31C 
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.96 

.97 

1.00 

0F31C 
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FR310 
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.50 
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.44 
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.65 
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.77 

.68 

.59 

.58 

.80 
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0JS1B 

.57 

.53 
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.65 
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.52 
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.63 

.55 

DF51B 
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.49 
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.57 
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.69 
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CT548 
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GS54B 
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.72 
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.84 
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.75 

.87 

.84 
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.63 

.62 

.76 

.72 

.71 

FR55B 

.58 

.53 

.68 

.63 

.54 

.52 

.50 

.63 

.59 

.50 

CT55B 

.57 

.53 

.70 

.64 

.55 

.51 

.49 

.64 

.59 

.49 

GS55B 

.65 

.61 

.86 

.77 

.68 

.57 

.55 

.78 

.70 

.61 

OJ55B 

.62 

.58 

.79 

.72 

.63 

.53 

.52 

.71 

.65 

.56 

0F558 

.62 

.59 

.78 

.72 

.66 

.52 

.50 

.68 

.63 

.56 

FR95B 

.72 

.68 

.86 

.80 

.72 

.62 

.61 

.77 

.74 

.66 

CT95B 

.71 

.67 

.86 

.80 

.73 

.60 

.58 

.77 

.73 

.66 

GS95B 

.73 

.70 

.92 

.84 

.78 

.62 

.61 

.81 

.76 

.69 

0J95B 

.72 

.68 

.89 

.82 

.75 

.61 

.60 

.79 

.74 

.68 

DF95B 

.68 

.65 

.82 

.77 

.75 

.54 

.53 

.71 

.66 

.64 

FR96B 

.68 

.66 

.67 

.67 

.62 

.57 

.57 

.61 

.61 

.56 

CT96B 

.58 

.57 

.57 

.58 

.55 

.49 

.50 

.51 

.52 

.50 

GS96B 

.78 

.74 

.91 

.86 

.79 

.66 

.65 

.82 

.78 

.71 

0J96B 

.73 

.71 

.81 

.78 

.72 

.61 

.60 

.73 

.70 

.65 

0F968 

.66 

.65 

.72 

.71 

.69 

.55 

.54 

.64 

.61 

.60 
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51B 

51B 

51B 

51B 

548 
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54B 
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54B 

1.00 

.97 

1.00 

.85 

.78 

1.00 

.97 

.93 

.94 

1.00 

.93 

.93 

.88 

.97 

1.00 

.66 

.58 

.84 

.73 

.66 

1.00 

.62 

.56 

.82 

.71 

.65 

.99 

1.00 

.64 

.54 

.90 

.76 

.68 

.96 

.95 

1.00 

.64 

.56 

.88 

.75 

.68 

.98 

.98 

.99 

1.00 

.53 

.48 

.78 

.65 

.63 

.92 

.94 

.93 

.95 

1.00 

.64 

.57 

.73 

.65 

.58 

.73 

.70 

.71 

.72 

.66 

.62 

.55 

.74 

.66 

.59 

.76 

.73 

.74 

.75 

.69 

.68 

.58 

.89 

.77 

.68 

.86 

.83 

.89 

.88 

.80 

.65 

.56 

.83 

.72 

.64 

.82 

.79 

.83 

.83 

.76 

.60 

.51 

.79 

.68 

.62 

.81 

.79 

.82 

.83 

.80 

.55 

.44 

.80 

.64 

.55 

.82 

.79 

.87 

.85 

.78 

.52 

.42 

.80 

.63 

.55 

.81 

.80 

.88 

.86 

.80 

.59 

.47 

.88 

.71 

.63 

.86 

.85 

.94 

.91 

.84 

.55 

.44 

.83 

.67 

.59 

.84 

.82 

.91 

.89 

.83 

.44 

.34 

.72 

.56 

.51 

.78 

.77 

.84 

.83 

.84 

.35 

.28 

.49 

.37 

.31 

.69 

.68 

.65 

.67 

.66 

.22 

.18 

.38 

.26 

.22 

.58 

.60 

.56 

.58 

.60 

.55 

.45 

.80 

.65 

.56 

.87 

.86 

.91 

.90 

.85 

.44 

.36 

.66 

.52 

.45 

.80 

.79 

.80 

.81 

.79 

.34 

.27 

.55 

.42 

.37 

.72 

.73 

.72 

.74 

.77 

(table  continues 


C-2 


FR 

CT 

GS 

OJ 

OF 

FR 

CT 

GS 

OJ 

OF 

FR 

CT 

GS 

OJ 

OF 

55B 

558 

55B 

558 

55B 

95B 

95B 

958 

95B 

95B 

96B 

96B 

968 

968 

968 

FR55B 

1.00 

CT55B 

.99 

1.00 

GS55B 

.92 

.93 

1.00 

0J55B 

.96 

.98 

.99 

1.00 

OF  558 

.93 

.96 

.95 

.97 

1.00 

FR95B 

.67 

.68 

.82 

.77 

.75 

1.00 

CT958 

.64 

.66 

.82 

.76 

.75 

.99 

1.00 

GS95B 

.67 

.70 

.87 

.80 

.79 

.96 

.97 

1.00 

00958 

.66 

.68 

.84 

.78 

.77 

.99 

1.00 

.99 

1.00 

DF95B 

.60 

.62 

.76 

.71 

.74 

.93 

.95 

.93 

.95 

1.00 

FR96B 

.51 

.50 

.56 

.54 

.56 

.73 

.70 

.66 

.70 

.71 

1.00 

CT96B 

.38 

.38 

.45 

.42 

.45 

.66 

.64 

.59 

.62 

.67 

.97 

1.00 

GS96B 

.67 

.68 

.83 

.77 

.76 

.90 

.90 

.92 

.91 

.87 

.86 

.79 

1.00 

0J96B 

.59 

.59 

.72 

.67 

.67 

.84 

.83 

.82 

.84 

.83 

.95 

.91 

.97 

1.00 

DF96B 

.51 

.52 

.62 

.58 

.61 

.77 

.78 

.75 

.77 

.83 

.93 

.91 

.91 

.97 

1.00 
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Means  and  Standard  Deviations  for  18  MOS 
on  26  Attribute  Measures  and  2  Criterion  Measures 
Project  A  Concurrent  Validation  Samples 

Key  to  Attribute  Abbreviations  of  Column  Headings 

Verb  =  Verbal  Ability 

Reas  =  Reasoning 

Numb  =  Number  Ability 

Spat  =  Spatial  Ability 

InPr  =  Mental  Information  Processing 

PS&A  =  Perceptual  Speed  &  Accuracy 

Mem  =  Memory 

Mech  =  Mechanical  Comprehension 

E-LC  =  Eye-Limb  Coordination 

Prec  =  Precision 

MJud  =  Movement  Judgment 

Dext  =  Hand  &  Finger  Dexterity 

Athl  =  Involvement  in  Athletics 

WkOr  =  Work  Orientation 

Coop  =  Cooperation/Stability 

Ener  =  Energy 

Cons  =  Conscientiousness 

Dom  =  Dominance/Confidence 

Tool  =  Interest  in  Using  Tools 

Rugd  =  Interest  in  Rugged  Activities 

Prot  =  Interest  in  Protective  Services 

Tech  =  Interest  in  Technical  Activities 

Sci  =  Interest  in  Science 

Lead  =  Interest  in  Leadership 

Art  =  Interest  in  Artistic  Activities 

Org  =  Interest  in  Efficiency  &  Organization 


D-l 


Attribute 


rs  co 

cn 

CO  CM 

cn 

O  rs 

m  m 

is  ro 

in  o 

-H  40 

^  rs 

rs  rs 

O'  CO 

co  no 

rs  CO 

iO  o 

rs  cnj 

n  cn 

ro  cm 

O'  co 

cm 

ro  c  j 

m  p-4 

O'  in 

rs  cm 

IO  IS 

CO  CO 

CO  ro 

xr  vo 

VO  vo 

CO  iO 

cn  co 

vo  m 

O  ro 

in  vo 

ro  rs 

in  ro 

in  rs 

cn  vo 

vo 

O  xT 

in  4T 

PH  VO 

no  xr 

o  id 

«er  Cni 

*r  cm 

CM 

LO  cm 

40-  CM 

m  cm 
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Key  to  Attribute  Abbreviations  of  Column  Headings 


Verb  =  Verbal  Ability 

Reas  =  Reasoning 

Numb  =  Number  Ability 

Spat  =  Spatial  Ability 

InPr  =  Mental  Information  Processing 

PS&A  =  Perceptual  Speed  &  Accuracy 

Mem  =  Memory 

Mech  =  Mechanical  Comprehension 

E-LC  =  Eye-Limb  Coordination 

Prec  =  Precision 

MJud  =  Movement  Judgment 

Dext  =  Hand  &  Finger  Dexterity 

Athl  =  Involvement  in  Athletics 

WkOr  =  Work  Orientation 

Coop  =  Cooperation/Stability 

Ener  =  Energy 

Cons  =  Conscientiousness 

Dorn  =  Dominance/Confidence 

Tool  =  Interest  in  Using  Tools 

Rugd  =  Interest  in  Rugged  Activities 

Prot  =  Interest  in  Protective  Services 

Tech  =  Interest  in  Technical  Activities 

Sci  =  Interest  in  Science 

Lead  =  Interest  in  Leadership 

Art  =  Interest  in  Artistic  Activities 

Org  =  Interest  in  Efficiency  &  Organization 
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Normalized  Attribute  Weights  for  18  MOS:  Core  Technical  Proficiency,  Mean  Attribute  Validities 
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Normalized  Attribute  Weights  for  18  MOS:  Core  Technical  Proficiency,  Mean  Attribute  Validities 
and  MOS  Mean  Component  Weights,  Top  5  Stepwise  Reduction 
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Table  E.6 


Normalized  Attribute  Weights  for  18  MOS:  Core  Technical 
Proficiency,  Mean  Attribute  Validities  and  MOS  Mean  Component 
Weights ,  ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.47 

.33 

.31 

12B 

.46 

.33 

.32 

13B 

.45 

.33 

.33 

16S 

.47 

.33 

.31 

19K 

.44 

.33 

.34 

27E 

.45 

.32 

.34 

31C 

.47 

.33 

.31 

51B 

.44 

.33 

.34 

54B 

.47 

.34 

.30 

55B 

.47 

.33 

.30 

63B 

.45 

.33 

.33 

67N 

.46 

.32 

.32 

7 1L 

.50 

.33 

.27 

76Y 

.48 

.33 

.29 

88M 

.47 

.33 

.31 

91A 

.50 

.33 

.27 

94B 

.48 

.34 

.28 

95B 

.50 

.33 

.28 

Verb  =  Project  A  measure  A1AVERBL .  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 
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E-13 


Table  E. 10 


Normalized  Attribute  Weights  for  18  MOS:  Core  Technical 
Proficiency,  Mean  Attribute  Validities  and  MOS  Threshold 
Component  Weights,  ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

1  IB 

.46 

.34 

.31 

12B 

.42 

.33 

•  .36 

13B 

.39 

.32 

.40 

16S 

.45 

.33 

.33 

19K 

.43 

.33 

.35 

27E 

.38 

.29 

.44 

31C 

.45 

.32 

.34 

51B 

.37 

.34 

.40 

54B 

.45 

.33 

.33 

55B 

.43 

.32 

.36 

63B 

.42 

.30 

.40 

67N 

.46 

.28 

.36 

7 1L 

.52 

.33 

.25 

76Y 

.53 

.37 

.19 

88M 

.44 

.31 

.36 

91A 

.55 

.33 

.21 

94B 

.49 

.37 

.24 

95B 

.51 

.32 

.28 

Verb  =  Project  A  measure  A1AVERBL.  2Numb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 


E-15 


Normalized  Attribute  Weights  for  18  MOS:  Core  Technical  Proficiency,  Mean  Attribute  Validities 
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Weights  for  18  MOS:  Core  Technical  Proficiency,  0-Mean  Attribute  Weights 
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E-20 


Table  E. 16 

Normalized  Attribute  Weights  for  18  HOS:  Core  Technical 
Proficiency,  Mean  Attribute  Validities  and  Cluster  Mean 
Component  Weights,  ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.46 

.33 

.31 

12B 

.46 

.33 

.31 

13B 

.46 

.33 

.31 

16S 

.46 

.33 

.31 

19K 

.46 

.33 

.31 

27E 

.46 

.32 

.32 

31C 

.46 

.32 

.32 

51B 

.46 

.33 

.33 

54B 

.46 

.33 

.31 

55B 

.49 

.33 

.28 

63B 

.46 

.33 

.33 

67N 

.46 

.33 

.33 

7 1L 

.49 

.33 

.28 

76Y 

.49 

.33 

.28 

88M 

.46 

.33 

.33 

91A 

.49 

.33 

.28 

94B 

.49 

.33 

.28 

95B 

.46 

.33 

.31 

Verb  =  Project  A  measure  A1AVERBL.  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH. 
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E-22 
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E-24 


Table  20 


Normalized  Attribute  Weights  for  18  MOS:  Core  Technical 
Proficiency,  Mean  Attribute  Validities  and  Cluster  Threshold 
Component  Weights,  ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.44 

.33 

.34 

12B 

.44 

.33 

.34 

13B 

.44 

.33 

.34 

16S 

.44 

.33 

.34 

19K 

.44 

.33 

.34 

27E 

.39 

• 

o 

.42 

31C 

.39 

.30 

.42 

51B 

.45 

.27 

.38 

54B 

.44 

.33 

.34 

55B 

.59 

.29 

.21 

63B 

.45 

.27 

.38 

67N 

.45 

.27 

.38 

7 1L 

.59 

.29 

.21 

76Y 

.59 

.29 

.21 

88M 

.45 

.27 

.38 

91A 

.59 

.29 

.21 

94B 

.59 

.29 

.21 

95B 

.44 

.33 

.34 

Verb  =  Project  A  measure  A1AVERBL.  2Numb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 
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Normalized  Attribute  Weights  for  18  MOS:  Overall  Job  Performance,  0-Mean  Attribute  Weights  and 


F.-2H 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance,  Mean  Attribute  Validities  and 
MOS  Mean  Component  Weights,  .95  Stepwise  Reduction 


E-29 


E-30 


Table  E.26 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance, 
Mean  Attribute  Validities  and  MOS  Mean  Component  Weights,  ASVAB 
Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.47 

.33 

.31 

12B 

.46 

.33 

.31 

13B 

.45 

.33 

.33 

16S 

.47 

.33 

.30 

19K 

.45 

.33 

.33 

27E 

.46 

.32 

.32 

31C 

.47 

.33 

.31 

51B 

.45 

.33 

.33 

54B 

.47 

.33 

.30 

55B 

.47 

.33 

.30 

63B 

.45 

.33 

.33 

67N 

.47 

.33 

.32 

7 1L 

.50 

.33 

.27 

76Y 

.48 

.33 

.29 

88M 

.47 

.33 

.31 

91A 

,48 

.33 

.29 

94B 

.48 

.34 

.29 

95B 

.49 

.33 

.28 

*Verb  =  Project  A  measure  A1AVERBL.  2Numb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH •  3Mech  =  Project  A  measure  A1ATECH . 
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E-32 


E-33 


E-34 


Table  E. 30 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance, 
Mean  Attribute  Valdities  and  MOS  Threshold  Component  Weights, 
ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.46 

.34 

.31 

12B 

.46 

.33 

.32 

13B 

.41 

.33 

.37 

16S 

.47 

.33 

.31 

19K 

.43 

.33 

.35 

27E 

.44 

.31 

.36 

31C 

.47 

.32 

.31 

51B 

.41 

.32 

.38 

54B 

.44 

.33 

.34 

55B 

.42 

.31 

.38 

63B 

.42 

.30 

.40 

67N 

.45 

.30 

.37 

7IL 

.  52 

.33 

.25 

76Y 

.46 

.36 

.29 

88M 

.43 

.31 

.37 

91A 

.42 

.34 

.35 

94B 

.45 

.33 

.32 

95B 

.49 

.32 

.29 

'Verb  =  Project  A  measure  A1AVERBL.  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 


E-35 


E-36 


E-39 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance,  Mean  Attribute  Validities  and 
Cluster  Mean  Component  Weights,  Top  5  Stepwise  Reduction 


E-40 


Table  E.36 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance, 
Mean  Attribute  Validities  and  Cluster  Mean  Component  Weights, 
ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

1  IB 

.46 

.33 

.31 

12B 

.46 

.33 

.31 

13B 

.46 

.33 

.31 

16S 

.46 

.33 

.31 

19K 

.46 

.33 

.31 

27E 

.46 

.33 

.32 

31C 

.46 

.33 

.32 

51B 

.46 

.33 

.31 

54B 

.46 

.33 

.31 

55B 

.48 

.33 

.29 

63B 

.46 

.33 

.32 

67N 

.46 

.33 

CM 

m 

• 

71L 

.48 

.33 

.29 

7  6Y 

.48 

.33 

.29 

88M 

.48 

.33 

.29 

91A 

.48 

.33 

.29 

94B 

.48 

.33 

.29 

95B 

.46 

.33 

.31 

Verb  =  Project  A  measure  A1AVERBL.  2Numb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 


E-41 


V"  “ 


E-42 


Normalized  Attribute  Weights  for  18  MOSs  Overall  Performance.  0-1  Attribute  Weights  and 
Cluster  Threshold  Component  Weights 
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Table  E.40 


Normalized  Attribute  Weights  for  18  MOS:  Overall  Performance, 
Mean  Attribute  Validities  and  Cluster  Threshold  Component  Weights, 
ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

1  IB 

.45 

.33 

.33 

12B 

.45 

.33 

.33 

13B 

.45 

.33 

.33 

16S 

.45 

.33 

.33 

19K 

.45 

.33 

.33 

27E 

.45 

.31 

.35 

31C 

.45 

.31 

.35 

51B 

.45 

.33 

.33 

54B 

.45 

.33 

.33 

55B 

.44 

.31 

.35 

63B 

.45 

.31 

.35 

67N 

.45 

.31 

.35 

7 1L 

.44 

.31 

.35 

76Y 

.44 

.31 

.35 

88M 

.44 

.31 

.35 

91A 

.44 

.31 

.35 

94B 

.44 

.31 

.35 

95B 

.45 

.33 

.33 

lVerb  =  Project  A  measure  A1AVERBL.  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 


E-45 


E-46 


Table  E. 42 


Least  Squares  Beta  Weights  for  18  MOS:  Core  Technical 
Proficiency,  ASVAB  Reduction 


MOS 

Attribute 

Verb1 

Numb2 

Mech3 

11B 

.26 

.22 

.28 

12B 

.17 

.12 

.44 

13B 

.06 

.16 

.21 

16S 

.23 

.33 

-.02 

19K 

.14 

.23 

.33 

27E 

.29 

.33 

.26 

31C 

.16 

.35 

.19 

51B 

.40 

.29 

.24 

54B 

.26 

.26 

.29 

55B 

.39 

.05 

.33 

63B 

-.07 

.05 

.70 

67N 

.12 

.12 

.63 

7 1L 

.29 

.44 

-.13 

76Y 

.16 

.45 

.10 

88M 

-.04 

.22 

.47 

91A 

.38 

.24 

.17 

94B 

.20 

.51 

.03 

95B 

.22 

.32 

.16 

Verb  =  Project  A  measure  A1AVERBL.  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH. 
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Table  E.44 


Least 

ASVAB 

Squares  Beta  Weights 
Reduction 

for  18  MOS: 

Overall  Job  Performance, 

Attribute 

MOS 

Verb1 

Numb2 

Mech3 

11B 

.15 

.25 

.24 

12B 

.12 

.27 

.26 

13B 

-.09 

.25 

.21 

16S 

.26 

.25 

.06 

19K 

.10 

.31 

.27 

27E 

.35 

.19 

.31 

31 C 

-.02 

.36 

.27 

51B 

.27 

.24 

.32 

54B 

.13 

.35 

.25 

55B 

.15 

.15 

.26 

63B 

-.02 

.06 

.47 

67N 

.22 

.14 

.51 

71L 

.13 

.34 

.14 

76Y 

.12 

.32 

.16 

88M 

-.14 

.26 

.43 

91A 

.13 

.24 

.27 

94B 

.03 

.48 

.01 

95B 

.09 

.32 

.33 

‘verb  =  Project  A  measure  A1AVERBL .  zNumb  =  Project  A  measures 
A1AQUANT  +  B3CCNMSH.  3Mech  =  Project  A  measure  A1ATECH . 
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Appendix  F 


Validities  of  Synthetically  Formed  Prediction  Equations 
for  18  MOS  by  Different  Criterion  Measures 
and  by  Different  Weighting  Methods 
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Equations  for  18  MOS:  Core  Technical  Proficiency, 

Mean  Attribute  Validities  and  MOS  Mean  Component 
Weights 

Table  F.2  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Core  Technical  Proficiency,  0-1 
Attribute  Weights  and  MOS  Mean  Component  Weights 

Table  F.3  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Core  Technical  Proficiency, 
0-Mean  Attribute  Weights  and  MOS  Mean  Component 
Weights 

Table  F.4  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Core  Technical  Proficiency, 

Mean  Attribute  Validities  and  MOS  Mean  Component 
Weights,  .95  Stepwise  Reduction 

Table  F . 5  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Core  Technical  Proficiency, 

Mean  Attribute  Validities  and  MOS  Mean  Component 
Weights,  Top  5  Stepwise  Reduction 

Table  F.6  Validities  of  Synthetically  Formed  Prediction 
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0-Mean  Attribute  Weights  and  MOS  Threshold  Component 
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Table  F.10  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Core  Technical  Proficiency, 
Mean  Attribute  Validities  and  MOS  Threshold  Component 
Weights ,  ASVAB  Reduction 
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Equations  for  18  MOS:  Core  Technical  Proficiency, 
Mean  Attribute  Validities  and  Cluster  Mean  Component 
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0-1  Attribute  Weights  and  Cluster  Mean  Component 
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Equations  for  18  MOS:  Core  Technical  Proficiency, 
0-Mean  Attribute  Weights  and  Cluster  Mean  Component 
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Equations  for  18  MOS:  Core  Technical  Proficiency, 
Mean  Attribute  Validities  and  Cluster  Threshold 
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0-1  Attribute  Weights  and  Cluster  Threshold  Component 
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Equations  for  18  MOS:  Core  Technical  Proficiency, 
0-Mean  Attribute  Weights  and  Cluster  Threshold 
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Equations  for  18  MOS:  Overall  Performance,  0-Mean 
Attribute  Weights  and  MOS  Mean  Component  Weights 

Table  F.24  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Overall  Performance,  Mean 
Attribute  Validities  and  MOS  Mean  Component  Weights, 
.95  Stepwise  Reduction 
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Equations  for  18  MOS:  Overall  Performance,  Mean 
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Equations  for  18  MOS:  Overall  Performance,  Mean 
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Equations  for  18  MOS:  Overall  Performance,  Mean 
Attribute  Validities  and  MOS  Threshold  Component 
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Table  F.28  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Overall  Performance,  0-1 
Attribute  Weights  and  MOS  Threshold  Component  Weights 

Table  F.29  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Overall  Performance,  0-Mean 
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Table  F.42  Validities  of  Least  Squares  Prediction  Equations  for 
18  MOS :  Core  Technical  Proficiency,  ASVAB  Reduction 

Table  F.43  Validities  of  Least  Squares  Prediction  Equations  for 
18  MOS:  Overall  Performance,  No  Reduction 

Table  F.44  Validities  of  Least  Squares  Prediction  Equations  for 
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Appendix  G 


Task  Complexity  Questionnaire  Item  Means  and 
Standard  Deviations  for  Different  Tasks 


Table  G.l 

Task  Complexity  Questionnaire  Item  Means  and  Standard  Deviations 


for 

an  Electrical  and  Electronic  Systems 

Maintenance 

Task 

MOS 

27E 

29E 

Task  Complexity  Items 

Mean 

S.D. 

Mean 

S.D. 

1 . 

Are  job  or  memory  aids  used? 

1.28 

0.46 

1.17 

0.38 

2. 

Quality  of  job  aids. 

3.84 

0.90 

3.76 

0.88 

3. 

How  many  steps  are  task  divided? 

2.72 

0.68 

2.86 

0.65 

4  . 

Steps  performed  in  definite 
sequence? 

3.24 

0.44 

3.07 

0.38 

5. 

Built-in  feedback? 

2.84 

0.80 

3.04 

0.96 

6. 

Time  limit  for  completion? 

1.20 

0.41 

1.50 

0.58 

7  . 

Mental  processing  requirements? 

1.84 

0.62 

2.04 

0.58 

8. 

Number  of  facts,  terms,  etc. 
memorize? 

2.08 

0.86 

2.46 

1.04 

9. 

How  hard  are  the  facts  or  terms? 

2.08 

0.28 

2.25 

0.52 

10. 

What  are  the  motor  control  demands? 

1.92 

0.64 

2.32 

0.61 

G-l 


Table  6.2 


Task  Complexity  Questionnaire  Item  Means  and  Standard  Deviations 
for  a  Vehicle  and  Equipment  Operations  Task 


MOS 

Task  Complexity 

12B 

13B 

51B 

54B 

Items 

Mean 

S.D. 

Mean 

S.D. 

Mean 

S.D. 

Mean 

S.D. 

1 .  Are  j  ob  or  memory 
aids  used? 

1.30 

0.46 

1.47 

0.50 

1.23 

0.42 

1.36 

0.48 

2.  Quality  of  job 
aids . 

3.77 

0.80 

3.52 

0.96 

3.65 

0.83 

3.67 

0.82 

3 .  How  many  steps  are 
task  divided? 

3.14 

0.89 

2.94 

0.85 

3.18 

0.88 

2.85 

0.86 

4 .  Steps  performed  in 
definite  sequence? 

3.23 

0.45 

3.21 

0.58 

3.29 

0.58 

3.13 

0.54 

5.  Built-in  feedback? 

2.46 

0.84 

2.46 

0.95 

2.51 

0.82 

2.48 

0.76 

6 .  Time  limit  for 
completion? 

1.30 

0.49 

1.49 

0.58 

1.22 

0.47 

1.26 

0.47 

7 .  Mental  processing 
requirements? 

2.13 

0.63 

2.03 

0.71 

2.06 

0.63 

2.10 

0.69 

8.  Number  of  facts, 
terms ,  etc . 
memorize? 

2.62 

0.90 

2.38 

0.98 

2.74 

0.99 

2.65 

0.89 

9 .  How  hard  are  the 
facts  or  terms? 

2.13 

0.43 

2.17 

0.63 

2.17 

0.63 

2.14 

0.39 

10.  What  are  the  motor 
control  demands? 

2.84 

0.46 

2.72 

0.65 

2.85 

0.43 

2.84 

0.50 

G-2 


Table  6.3 

Task  Complexity  Questionnaire  Item  Means  and  Standard  Deviations 
for  a  Clerical  Task 

MOS  _ 


55B  95B 


Task  Complexity  Items 

Mean 

S.D. 

Mean 

S.D. 

1 . 

Are  job  or  memory  aids  used? 

1.28 

0.46 

1.18 

0.39 

2. 

Quality  of  job  aids. 

4.03 

0.70 

3.53 

0.76 

3. 

How  many  steps  are  task  divided? 

2.50 

0.77 

2.36 

0.80 

4. 

Steps  performed  in  definite 
sequence? 

3.12 

0.83 

3.24 

0.68 

5. 

Built-in  feedback? 

2.71 

1.22 

2.82 

1.17 

6. 

Time  limit  for  completion? 

1.22 

0.47 

1.67 

0.60 

7  . 

Mental  processing  requirements? 

2.12 

0.63 

2.33 

0.56 

8. 

Number  of  facts,  terms,  etc. 
memorize? 

2.31 

0.89 

2.58 

1.01 

9. 

How  hard  are  the  facts  or  terms? 

2.06 

0.63 

2.31 

0.76 

10. 

What  are  the  motor  control  demands? 

1.78 

0.85 

2.36 

0.86 
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Task  Complexity  Questionnaire  Item  Means  and  Standard  Deviations 
for  a  Communication  Task 


MOS 

31C 

31D 

96B 

Task  Complexity  Items 

Mean 

S.D. 

Mean 

S.D. 

Mean 

S.D. 

1.  Are  job  or  memory  aids 
used? 

1.26 

0.44 

1.06 

0.24 

1.15 

0.36 

2.  Quality  of  job  aids. 

3.91 

0.80 

3.81 

0.66 

3.78 

0.82 

3 .  How  many  steps  are  task 
divided? 

2.37 

0.63 

2.24 

0.44 

2.39 

0.49 

4 .  Steps  performed  in 
definite  sequence? 

3.70 

0.49 

3.47 

0.51 

3.55 

0.56 

5.  Built-in  feedback? 

2.51 

1.20 

2.53 

1.12 

2.55 

1.04 

6 .  Time  limit  for 
completion? 

1.77 

0.51 

1.82 

0.39 

1.63 

0.52 

7 .  Mental  processing 
requirements  ? 

1.88 

0.61 

1.76 

0.56 

1.98 

0.59 

8.  Number  of  facts,  terms, 
e t c .  memor i ze  ? 

2.38 

0.86 

2.53 

1.07 

2.89 

0.77 

9 .  How  hard  are  the  facts 
or  terms? 

2.08 

0.42 

2.12 

0.33 

2.32 

0.50 

10.  What  are  the  motor 
control  demands? 

2.00 

0.69 

1.88 

0.60 

2.08 

0.52 
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ADDENDUM  TO  VOLUME  I  OF  THE  ARMY  SYNTHETIC  VALIDITY 
PROJECT:  REPORT  OF  PHASE  III  RESULTS 

An  Investigation  of  the  Use  of  Least  Squares  Validity 
Estimators  and  Correction  Formulas  when  Population 
Values  are  Available  for  Predictor  Intercorrelations 

Rodney  L.  Rosse  and  Norman  G.  Peterson 


Introduction 


Two  general  approaches  were  used  to  create  prediction  func¬ 
tions  for  this  project:  (1)  the  synthetic  validity  method  wherein 
the  weights  applied  to  predictors  for  each  MOS  were  determined  by 
an  experimental  design  that  used  no  information  from  the  actual 
data,  and  (2)  the  empirical  approach  wherein  least  squares  re¬ 
gression  weights  were  computed  for  each  MOS.  As  made  plain  in 
the  body  of  the  report  (see  Chapter  4),  the  objective  was  a 
direct  comparison  of  the  absolute  and  discriminant  validities  of 
the  two  approaches. 

One  troubling  issue  arises  with  the  direct  comparison  of 
validity  estimates  from  the  two  approaches.  Namely,  the  esti¬ 
mates  of  validity  for  the  two  approaches  are  different  statistics 
with  different  statistical  properties.  Therefore,  it  is  not 
known  with  certainty  that  a  direct  comparison  validity  is 
appropriate.  How  do  the  two  statistics  differ? 

In  general,  the  validity  of  a  predictor  for  predicting  a 
particular  criterion  is  defined  as  the  correlation  of  the  predic¬ 
tor  and  the  criterion  in  the  population  from  which  the  sample  was 
drawn.  The  synthetic  validity  approach  yields  weighted  compos¬ 
ites  of  predictors  obtained  directly  from  information  external  to 
the  existing  sample.  This  means  that  a  simple  correlation  be¬ 
tween  any  one  synthetically  formed  composite  and  a  criterion  is  a 
relatively  good  (unbiased)  estimate  of  the  validity  of  the  com¬ 
posite  for  predicting  that  criterion.  However,  in  the  empirical 
approach,  the  weights  used  in  forming  predictor  composites  are 
determined  using  sample  data;  the  least  squares  regression 
weights  are  computed  using  the  sample  data  and  the  weights  are 
optimized  for  idiosyncrasies  of  the  specific  sample  which  may  not 
be  characteristic  of  the  population.  The  correlation  between  the 
least  squares  weighted  composite  and  the  criterion  computed  from 
sample  data  is  often  called  the  "foldback"  correlation  to  denote 
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the  fact  that  it  is  subject  to  these  idiosyncrasies.  It  is  well- 
known  that  the  correlation  between  the  least  squares  weighted 
composite  and  the  criterion,  when  computed  using  the  sample  data, 
leads  to  an  estimate  that  has  substantial  positive  bias — the 
amount  of  bias  varying  inversely  with  the  size  of  the  sample. 

Various  "shrinkage"  formulas  have  been  proposed  and  tested 
for  correcting  the  positive  bias  in  the  foldback  correlation  so 
as  to  produce  a  relatively  unbiased  estimate  of  the  validity  of 
the  least  squares  composite.  Several  of  the  formulas  have  proven 
in  practice  and  in  Monte  Carlo  studies  to  yield  estimates  of  the 
validity  which  have  very  little  bias  under  a  variety  of  condi¬ 
tions.  Thus,  if  the  appropriate  formula  could  be  used,  the 
estimators  of  absolute  validity  for  the  two  methods  could  both  be 
made  to  be  unbiased — making  it  more  reasonable  to  directly  com¬ 
pare  the  two  methods. 

In  this  project,  as  explained  in  Chapter  4,  we  applied  three 
such  formulas  to  foldback  correlations: 

1.  Claudy  (1978)  Equation  12  (actually  an  estimate  of 
the  population  Multiple  R  which  is  used  to  obtain 
an  estimate  of  the  validity  coefficient) ; 

2.  Claudy  (1978)  Validity  Estimate  (uses  Equation  12 
to  obtain  a  validity  coefficient  estimate) ;  and 

3.  Rozeboom  (1978)  Equation  #8. 

We  discussed  these  formulas  and  showed  the  results  of  their 
application  to  the  samples  in  this  project  in  Chapter  4  (see 
Tables  4.9  and  4.10). 

All  of  the  shrinkage  formulas  are  based  upon  the  practice  of 
computing  the  regression  weights  and  the  foldback  correlation 
using  sample  data  for  both  predictor  intercorrelations  and  pre¬ 
dictor-criterion  correlations,  which  is  the  case  normally  encoun¬ 
tered.  However,  because  of  a  unique  opportunity,  we  were  able  to 
use  "population"  data  for  predictor  intercorrelations  rather  than 
relying  on  separate  sample  estimates  (from  each  MOS)  of  the 
predictor  interccrrelations .  Note  that  this  method  of  computing 
regression  weights,  while  it  seems  desirable,  does  not  comply 
with  the  conditions  for  which  the  shrinkage  formulas  were  de¬ 
veloped. 


To  explicate  further,  in  the  usual  case,  the  computa¬ 
tion  of  least  squares  regression  weights  is  accomplished  by  the 
following  formula: 

B  =  RXX~ 1  RXY  [1] 
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where  RXX  is  a  matrix  of  intercorrelations  between  predictor 
variables  and  RXY  is  a  column  vector  of  correlations  between  each 
predictor  variable  and  the  criterion.  The  weights,  B,  are  least 
squares  weights  for  computing  predicted  scores  (or  composites) 
using  predictor  scores  transformed  to  z-scores  (mean  =  0  and 
variance  =  1) .  Then  the  foldback  correlation  can  be  readily 
computed  as 

r  =  (  B'  RXY  ) 1/2  [2] 

and  it  is  this  value  which  is  used  in  the  further  computation  of 
any  one  of  the  above  three  shrinkage  formulas. 

In  equation  [1],  which  defines  the  weights,  B,  the  matrix  of 
intercorrelations  between  predictors,  RXX,  is  normally  based  upon 
the  particular  sample,  or,  in  this  case,  the  sample  for  a  partic¬ 
ular  MOS.  However,  as  noted  just  above,  this  was  not  done  for 
the  initial  round  of  validation  analyses.  The  matrix,  RXX,  was 
not  used  because  it  was  not  necessary  to  estimate  the  intercorre¬ 
lations  between  predictors  separately  for  each  sample.  For  all 
practical  purposes,  the  population  values  were  known.  Specifi¬ 
cally,  the  predictor  correlations  that  were  to  be  found  in  the 
applicant  population  were  estimated  using  a  very  large  number  of 
soldiers  (all  7045  soldiers  in  the  total  sample) .  Conventional 
wisdom  led  us  to  substitute  that  "population"  correlation  matrix 
for  the  sample  correlation  matrix.  Thus,  the  least  squares 
regression  weights  were  computed  as  follows: 

BP  =  RHOXX-1  RXY  [3] 

where  RHOXX  is  the  population  matrix  of  predictors.  The  foldback 
correlation  was  then  computed  in  the  same  way  as  before  (Equation 
[2] )  . 


This  opportunity,  using  the  population  correlations  to 
compute  the  regression  weights,  is  a  luxury  that  is  not  ordinari¬ 
ly  available.  However,  given  that  they  were  used,  it  seemed 
problematic  to  then  apply  the  shrinkage  formulas  as  though  the 
sample  correlations  had  been  used  in  computing  the  regression 
weights.  Given  the  divergence  from  the  conditions  assumed  in  the 
derivations  for  the  shrinkage  formulas,  it  seemed  questionable  to 
assume  that  the  known  properties  of  the  resulting  estimators  of 
validity  would  apply  in  this  case. 

In  summary,  two  troublesome  issues  arose  in  the  attempts  to 
compare  the  absolute  and  discriminant  validities  of  the  synthetic 
validity  and  the  empirical  validity  approaches. 
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1.  The  estimation  of  validity  differs  between  the  two 
approaches,  namely,  the  estimates  using  syntheti¬ 
cally  derived  weights  are  not  subject  to  bias  from 
sample-specific  idiosyncrasies  as  are  the  estimates 
using  least  squares  regression  weights  (although 
attempts  were  made  to  correct  the  latter  for  bias 
by  using  known  formulas) . 

2.  For  the  initial  set  of  analyses,  the  least  squares 
estimation  of  weights  and  the  resulting  estimates 
of  validity  for  the  empirical  approach  were  comput¬ 
ed  using  population  (rather  than  sample)  correla¬ 
tions,  which  casts  some  doubt  upon  the  properties 
of  the  validity  estimators,  i.e.,  the  foldback 
correlations . 

Since  little  is  known  about  the  characteristics  of  the 
validity  estimates  computed  in  the  specific  conditions  of  this 
study,  we  chose  to  investigate  the  possibility  of  altered  proper¬ 
ties  by  using  the  Monte  Carlo  method.  In  this  method,  the  condi¬ 
tions  of  sampling  are  approximated  by  randomly  generating  samples 
from  populations  which  are  reasonably  similar  to  those  under 
which  the  "real"  data  of  this  project  were  obtained.  Two  ques¬ 
tions  motivated  the  Monte  Carlo  studies: 

1.  What  effect  does  using  the  correlations  of  predic¬ 
tors  from  the  population  have  upon  the  estimates  of 
absolute  validity,  and 

2.  How  is  the  index  of  discriminant  validity  affected 
by  the  mixture  of  estimators  of  two  different 
types? 

Finally,  we  also  were  concerned  about  the  possible  effect 
that  the  Batch  A  MOS  versus  Batch  Z  MOS  distinction  might  have  on 
our  results.  The  criterion  measures  differ  for  the  Batch  A  and 
Batch  Z  MOS,  primarily  in  that  the  Batch  A  MOS  included  "hands- 
on"  or  work-sample  tests  and  the  Batch  Z  MOS  did  not  include  such 
measures.  Therefore,  we  wished  to  run  separate  simulations  for 
all  the  MOS  together,  the  Batch  A  MOS  alone,  and  the  Batch  Z  MOS 
alone . 


Method:  Simulations 


The  Monte  Carlo  studies  were  designed  to  exactly  duplicate 
the  statistical  processes  used  in  the  Synthetic  Validity  Project. 
Unlike  the  "real"  project,  however,  we  were  able  to  draw  the 
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samples  from  known  populations. 

The  samples  were  generated  as  pseudo-random  numbers  drawn 
from  populations  with  known  correlations  between  predictors  and 
between  predictors  and  criteria.  In  order  to  make  the  popula¬ 
tions  similar  to  those  in  the  Synthetic  Validity  research,  we 
used  the  actual  correlations  realized  in  the  Synthetic  Validity 
sample  and  assumed  them  to  be  a  population,  that  is,  the  popula¬ 
tions  sampled  were  based  upon  the  following: 


a.  RHOXX,  which  is  a  26  by  26  matrix  of  population 
correlations  between  predictors,  was  taken  to  be 
the  correlations  of  predictors  based  upon  7045 
cases  (and  corrected  for  range  restriction,  as 
explained  in  Chapter  4,  p.  4-6),  and 

b.  RHOXY,  which  is  a  26  by  1  vector  of  mean  correla¬ 
tions  of  each  predictor  with  the  Core  Technical 
Proficiency  criterion  score  or  the  Overall  Perform¬ 
ance  criterion  score.  The  means  were  computed 
across  all  18  MOS  or  across  9  MOS  when  only  Batch  A 
or  Batch  Z  MOS  were  included.  All  correlations  had 
been  corrected  for  range  restriction. 


Thus,  the  pseudo-random  samples  of  observations  were  gener¬ 
ated  from  a  population  in  which  the  correlations  among  predictors 
are  identical  to  that  used  in  the  Synthetic  Validity  research,  but 
the  correlations  of  the  26  predictors  with  the  Core  Technical  or 
Overall  Performance  criterion  (i.e.,  the  predictor's  individual 
validity  coefficients)  are  set  to  the  same  value  for  all 
MOS — namely,  the  mean  computed  across  all  MOS.  This  simulates  the 
condition  of  absolute  validity  nearly  identical  to  that  observed 
in  the  actual  Synthetic  Validity  data,  but  discriminant  validity 
equals  zero — unlike  the  condition  observed  in  the  actual  data. 
Discriminant  validities  of  .07  for  the  Core  Technical  criterion 
and  .04  for  the  Overall  Performance  criterion  were  found  when 
RHOXX  was  used  in  the  validation  analyses  (see  Addendum  Table  5) . 

The  population  regression  weights  and  the  population  multi¬ 
ple  correlation  can  be  computed  using  RHOXX  and  RHOXY  as  follows: 

BPOP  =  RHOXX-1  RHOXY 


and 


rpop  =  (  BPOP'  RHOXY  ) 1/2 
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where  BPOP  is  the  26  by  1  vector  of  true  regression  weights  and 
rpop  is  the  population  multiple  correlation. 

For  each  MOS,  a  sample  of  n^  multivariate  observations  was 
drawn  from  the  population  where  n^_  is  the  sample  size  realized 
for  the  ith  MOS  in  the  Synthetic  Validity  study.  (See  Table  4.1 
in  Chapter  4  for  a  list  of  the  sample  sizes,  which  varied  from  69 
to  597.) 

Thus,  for  each  of  the  MOS,  it  was  possible  to  obtain  "sam¬ 
ple"  correlation  matrices: 

RXX-l  =  26  by  26  matrix  of  the  correlations  of  pre¬ 

dictors  based  upon  n^  pseudo-random  observa¬ 
tions  for  the  i^h  MOS,  and 

RXY^  =  26  by  1  vector  of  the  correlations  of  n^ 

pseudo-random  criterion  scores  with  each  of 
the  26  predictors  in  the  ith  MOS. 


Regression  weights  were  obtained  in  two  ways: 


1.  The  "usual"  way  in  which  the  sample  correlations 
for  both  predictors  and  criteria  were  used,  i.e., 

Bi  =  RXXi-1  RXYi, 
and, 

2.  Substituting  the  population  correlations,  RHOXX,  in 
place  of  the  sample  correlations,  RXX^,  in  the 
computation  of  weights,  i.e., 

=  RHOXX-1  RXYj_ . 


The  reason  for  computing  the  weights  in  both  ways,  as  ex¬ 
plained  in  the  introduction  to  this  section,  is  that  the  first 
way  is  the  usual  way  to  do  it  while  the  second  way  was  tried  out 
in  the  Synthetic  Validity  Project  research,  due  to  the  availabil¬ 
ity  of  population  correlations  for  predictors. 

For  each  of  the  randomly  generated  samples,  eight  statistics 
were  computed,  four  for  each  of  the  two  sets  of  weights,  and 
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1.  Two  foldback  correlations  which  are  the  correla¬ 
tions  of  each  composite  with  the  criterion  in  the 
sample,  i.e., 

rfi  =  (  Bj/  BXYl  ) 1/2 

and 

rpfA  =  (  BPj/  RXYi  ) 1/2 . 

2.  Two  Claudy  estimates  of  the  population  multiple 
correlation  coefficient: 

rc^  =  the  correlation  obtained  by  applying  Claudy 
Equation  12  (1978)  to  rf^,  and 

rpc^=  the  correlation  obtained  by  applying  Claudy 
Equation  12  to  rpfj_. 

3.  Two  Rozeboom  estimates  of  the  validity  coefficient: 

rri  =  the  correlation  obtained  by  applying  Rozeboom 
Equation  8  (1978)  to  rfj,, 

and 

rpr^=  the  correlation  obtained  by  applying  Rozeboom 
Equation  8  to  rpfj_. 

4.  Two  "True"  validities,  obtained  by  applying  each 
set  of  weights  to  the  population  from  which  the 
samples  were  drawn: 

=  (  Bj/  RHOXY  ) (  Bl'  RHOXX  Bi  )-1/2 

and 

vpi  =  (  BPi'  RHOXY  ) (  BPi/  RHOXX  BPi  )_1/2. 

Of  course,  it  is  not  ordinarily  possible  to  compute  the  last 
pair  of  statistics,  the  true  validities,  because  the  populations 
are  not  routinely  available. 
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For  sampling  conditions  corresponding  to  each  group/crite¬ 
rion  set/  we  computed  the  mean  of  these  coefficients  across  all 
MOS.  These  means  are  the  absolute  validities.  We  also  computed 
the  mean  off-diagonal  validities,  that  is,  the  mean  of  the  valid¬ 
ity  coefficients  obtained  by  applying  the  weights  derived  for  a 
given  MOS  to  the  other  MOS: 

CVij  =  (  B .j/  RXYj  )  (  RHOXX  B±  )“1/2 

cvpj^ j  =  (  BPj/  RXYj  )  (  BP^'  RHOXX  BP.j_  )-1/2 

(Note  that  if  i=j  then  cv^  =  rf^  and  cvp.j_.j_  =  rpf^.) 


We  completed  this  process  30  times  or  realizations  (as  the 
repetitions  are  sometimes  labeled  in  Monte  Carlo  studies)  and 
then  computed  the  means  and  standard  deviations  of  the  statistics 
across  the  30  realizations. 

After  obtaining  the  mean  values  for  the  various  coeffi¬ 
cients,  we  could  then  compute  absolute  and  discriminant  validi¬ 
ties  just  as  we  did  in  all  the  other  Synthetic  Validity  research. 
The  mean  of  the  "shrunken”  validities  is  taken  as  the  absolute 
validity  and  the  discriminant  validity  is  obtained  by  subtracting 
the  mean  of  the  off-diagonal  validity  coefficients  from  the 
absolute  validity. 


Results:  Simulations 


Addendum  Table  1  shows  a  summary  of  the  Monte  Carlo  simula¬ 
tion  runs.  There  are  six  sets  of  summaries,  one  for  each  of  the 
Group/Criterion  conditions:  All  MOS,  Batch  A  MOS,  or  Batch  Z  MOS 
for  the  Overall  Criterion  or  the  Core  Technical  Criterion.  Items 
to  note  in  this  table  are  the  standard  deviations  for  the  coeffi¬ 
cients.  These  are  the  standard  deviations  for  the  coefficients 
computed  across  the  30  realizations  and  indicate  the  amount  of 
sampling  error  incurred  in  the  Monte  Carlo  simulations.  Note 
that  the  standard  deviation  is  zero  for  the  "true  multiple  corre¬ 
lation"  since  there  is  only  one  such  value  for  each  set.  Almost 
all  of  the  standard  deviations  are  less  than  .02,  with  the  excep¬ 
tion  of  several  coefficients  for  the  simulated  Batch  Z  MOS,  which 
are  as  high  as  .0342 — indicating  that  there  is  more  sampling 
error  for  that  condition.  Even  so,  the  largest  standard  error  of 
any  mean  coefficient  is  .006  (for  the  Rozeboom  #8  RHOXX  Batch 
Z/Overall  condition,  standard  error  =  .0342/square  root  of 
30) ,  .pa 
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and  the  more  typical  size  is  .002.  Thus,  we  think  30  realiza¬ 
tions  were  more  than  sufficient  to  allow  the  comparisons  that  we 
wished  to  make. 


Addendum  Table  1  is  fairly  dense,  and  it  is  easier  to  under¬ 
stand  the  resulcc  if  the  salient  statistics  are  broken  into 
separate  tables.  Addendum  Table  2  shows  the  foldback  correla¬ 
tions  for  the  six  conditions  computed  using  RHOXX  and  RXX.  Also 
shown  are  the  "true  multiple  correlations"  so  that  the  relative 
bias  of  the  two  foldbacks  can  be  evaluated.  Examination  of  this 
table  shows  that  the  foldbacks  computed  using  the  population 
correlations  are  higher  than  the  foldbacks  computed  using  the 
sample  correlations  for  all  conditions,  and  therefore  exhibit 
more  bias  (i.e.,  they  are  more  discrepant  from  the  true  multiple 
correlation,  by  .01  to  .03).  The  greatest  bias  appears  for  the 
simulated  Batch  Z  conditions  using  the  population  correlation. 
Since  all  the  correction  formulas  are  based  on  the  foldback 
correlations,  it  is  evident  from  this  table  that  the  coefficients 
based  on  the  computations  using  population  predictor  correlations 
(RHOXX)  will  show  more  bias  than  those  based  on  sample  predictor 
correlations  (RXX) . 

Addendum  Table  3  shows  the  Claudy  Equation  #12  coefficients 
with  the  true  multiple  correlations  for  the  six  conditions.  The 
same  pattern  of  results  holds  as  did  for  Addendum  Table  2,  that 
is,  the  Claudy  coefficients  based  on  computations  using  RHOXX 
show  more  positive  bias  than  do  the  Claudy  coefficients  based  on 
computations  using  RXX,  for  all  conditions.  Indeed,  the  "sample" 
Claudy  equation  is  very  close  to  the  true  multiple  correlation  in 
all  conditions,  whereas  the  bias  for  the  population  Claudy  equa¬ 
tion  shows  positive  bias  ranging  from  about  .01  to  .07.  Again, 
the  bias  is  higher  for  the  simulated  Batch  Z  conditions. 

Addendum  Table  4  shows  the  Rozeboom  Equation  #8  estimates, 
intended  to  estimate  the  validity  of  the  least  squares  equations, 
compared  to  the  "true  validities"  for  the  six  conditions.  Again, 
the  same  pattern  of  results  holds:  the  estimates  based  on  coef¬ 
ficients  computed  using  RHOXX  show  positive  bias  ranging  from  .02 
(Batch  A/Overall)  to  .08  (Batch  Z/Core  Tech) .  Note  that  the 
Rozeboom  equation  estimates  based  on  computations  using  sample 
predictor  correlations  show  a  slight  negative  bias,  about  -.01  in 
four  of  the  six  conditions,  but  are  much  closer  to  the  "true 
validity"  in  all  conditions.  Again,  it  appears  that  the  bias  is 
most  severe  for  the  simulated  Batch  Z  conditions. 

These  consistent  results  demonstrate  that  use  of  RHOXX,  the 
population  predictor  intercorrelations,  rather  t-han  RXX,  the 
sample  predictor  intercorrelations,  in  computing  least  squares 
equations  leads  to  a  positive  bias  in  the  foldback  correlation. 
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Addendum  Table  2 


Foldback  Correlations  Computed  Using  Population  Correlations  of 
Predictors  (RHOXX)  and  Using  Sample  Correlations  of  Predictors 
(RXX)  for  Six  MOS/Criterion  Combinations 


Simulated 

Group/ 

Criterion 

Population 

Foldback 

Sample 

Foldback 

True 

Multiple  R 

All  MOS/ 
Overall 

.7038 

.  6838 

.6435 

Batch  A/ 
Overall 

.6758 

.  6614 

.6351 

Batch  Z / 
Overall 

.7360 

.7096 

.6581 

All  Mos/ 

Core  Tech 

.7170 

.6962 

.6600 

Batch  A/ 

Core  Tech 

.6726 

.6588 

.6302 

Batch  Z / 

Core  Tech 

.7744 

.7430 

.6983 

Note.  Coefficients  computed  using  population  correlations  of 

predictors  (RHOXX)  and  using  sample  correlations  of  pre¬ 
dictors  (RXX) ,  from  a  simulation  of  the  Synthetic  Validity 
Project  data  base  for  the  case  with  similar  absolute 
validity  and  with  zero  discriminant  validity. 
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Addendum  Table  3 


Claudy  Equation  #12  Estimates  of  Population  Multiple  Correlations 
and  True  Multiple  Correlations  for  Six  MOS/Criterion  Combinations 


Simulated 

Group/ 

Criterion 

Population 

Claudy 

#12 

Sample 

Claudy 

#12 

True 

Multiple  R 

All  MOS/ 
Overall 

.6711 

.6452 

.6435 

Batch  A/ 
Overall 

.6514 

.6355 

.6351 

Batch  Z / 
Overall 

.6953 

.6591 

.6581 

All  Mos/ 
Core  Tech 

.6850 

.  6583 

.6600 

Batch  A/ 
Core  Tech 

.6478 

.6326 

.6302 

Batch  Z/ 
Core  Tech 

.7744 

.7014 

.6983 

Note.  Coefficients  computed  using  population  correlations  of 

predictors  (RHOXX)  and  using  sample  correlations  of  pre¬ 
dictors  (RXX) ,  from  a  simulation  of  the  Synthetic  Validity 
Project  data  base  for  the  case  with  similar  absolute 
validity  and  with  zero  discriminant  validity. 
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Addendum  Table  4 


Rozeboom  Equation  #8  Estimates  of  Validities  and  True  Validities 
for  Six  MOS/Criterion  Combinations 


Simulated 

Group/ 

Criterion 

Population 

Rozeboom 

#8 

Population 

True 

Validity 

Sample 

Rozeboom 

#8 

Sample 

True 

Validity 

All  MOS/ 
Overall 

.6290 

.5905 

.5946 

.6014 

Batch  A/ 
Overall 

.6221 

.5965 

.6043 

.6078 

Batch  Z/ 
Overall 

.6424 

.5913 

.5895 

.6034 

All  Mos/ 
Core  Tech 

.6442 

.  6087 

.6077 

.6208 

Batch  A/ 
Core  Tech 

.6180 

.5910 

.6010 

.6024 

Batch  Z/ 
Core  Tech 

.7063 

.6344 

.  6475 

.6498 

Note.  Coefficients  computed  using  population  correlations  of 

predictors  (RHOXX)  and  using  sample  correlations  of  pre¬ 
dictors  (RXX) ,  from  a  simulation  of  the  Synthetic  Validity 
Project  data  base  for  the  case  with  similar  absolute 
validity  and  with  zero  discriminant  validity. 
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This,  in  turn,  leads  to  a  positive  bias  in  the  estimates  of 
population  multiple  correlations  and  population  validity  coeffi¬ 
cients  for  the  conditions  encountered  in  the  Synthetic  Validity 
research. 

Addendum  I'aMe  5  demonstrates  this  bias  in  the  values  of  the 
observed  absolute  validity  coefficients  and  the  discriminant 
validity  index.  The  first  column  in  this  table  shows  the  actual, 
observed  values  for  absolute  and  discriminant  validity,  using  the 
Rozeboom  correction  on  coefficients  computed  from  least  squares 
equations  using  RHOXX.  The  second  column  presents  values  for  the 
same  coefficients  and  conditions,  except  that  they  come  from  a 
population  that  actually  has  zero  discriminant  validity  by  de¬ 
sign,  i.e.,  from  the  simulations.  Note  that  there  are  positive 
values  for  discriminant  validity,  which  should  equal  zero,  for 
these  conditions,  indicating  the  extent  of  bias  that  has  oc¬ 
curred.  For  example,  for  the  All  MOS/Overall  condition  the 
observed  data  show  absolute  and  discriminant  validity  values 
of  .63  and  .04,  respectively,  and  the  zero  discriminant  validity 
values  are  the  same.  Thus,  the  appearance  of  discriminant  valid¬ 
ity  for  this  condition  appears  to  be  attributable  to  the  positive 
bias  in  the  calculation  of  least  squares  equations  using  RHOXX. 
Further  evidence  for  this  is  shown  in  the  third  column.  This 
column  shows  the  Rozeboom  coefficients  for  the  zero  discriminant 
validity  simulation  when  the  least  squares  equations  are  computed 
using  the  sample  predictor  intercorrelations  (RXX) .  Note  that 
there  is  a  much  smaller,  slightly  negative  bias  (about  -.01  in 
four  of  the  conditions,  and  no  bias  for  two  of  the  conditions) . 

Thus,  the  answer  to  our  first  research  question,  what  is  the 
effect  of  using  RHOXX  in  computing  least  squares  equations  on  the 
estimates  of  absolute  validity,  is  that  the  estimates  are  posi¬ 
tively  biased,  which  in  turn  leads  to  a  positive  bias  in  the 
index  of  discriminant  validity,  our  second  research  question. 
Recall  that  this  index  is  simply  the  difference  between  the 
absolute  validity  and  the  mean  off-diagonal  validity,  which  does 
not  include  any  foldback  correlations. 

This  finding  of  positive  bias  held  across  all  Simulated 
Group/Criterion  combinations,  but  was  most  pronounced  for  Batch 
Z,  compared  to  Batch  A  or  All  MOS.  One  possible  explanation  for 
this  finding  is  the  difference  in  Batch  Z  and  Batch  A  sample 
sizes,  which  were  used  in  the  simulation.  The  mean  sample  size 
for  the  nine  Batch  Z  MOS  is  296  (standard  deviation  =  143,  range 
=  69  to  544)  compared  to  448  for  the  Batch  A  MOS  (standard  devia¬ 
tion  =  82,  range  =  289  to  597) . 

Addendum  Table  6  provides  some  more  information  bearing  on 
the  relationship  of  sample  size  to  these  results.  The  table 
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Addendum  Table  5 


Absolute  and  Discriminant  Validity  Coefficients  Obtained  Using 


Rozeboom  Equation  #8  Estimates  of 
MOS/Criterion  Considerations 

Validities  for  Six 

Simulated 

Simulated 

Simulated 

Observed 

Zero  Disc. 

Zero  Disc. 

Group/ 

Rozeboom 

Rozeboom  #8 

Rozeboom  #8 

Criterion 

#8 

Using  RHOXX 

Using  RXX 

All  MOS/ 

.63/. 04 

.63/. 04 

.59/-. 01 

Overall 

Batch  A/ 

.60/. 00 

.62/. 02 

.60/-. 01 

Overall 

Batch  Z / 

.66/. 09 

.64/. 05 

.59/-. 01 

Overall 

All  Mos/ 

.67/. 07 

.64/. 03 

.61/-. 01 

Core  Tech 

Batch  A/ 

.63/. 06 

.62/. 03 

.60/. no 

Core  Tech 

Batch  Z / 

.70/. 08 

.71. /07 

.65/. 00 

Core  Tech 

Note.  Computed 

using  population  correlations 

of  predictors 

(RHOXX)  , 

as  observed  in  the 

Synthetic  Validity  Project  data 

base  for 

six  MOS/Criterion 

combinations 

,  the  corresponding 

coefficients  from  a  simulation  of  the  Synthetic  Validity 
data  base  with  similar  absolute  validity  but  with  zero 
discriminant  validity,  and  Rozeboom  estimates  computed 
using  sample  correlations  of  predictors  (RXX)  for  the  zero 
discriminant  validity  simulation. 
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Addendum  Table 
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Means  .7038  .6290  .0385  .5905  .6711  .0276  .6838  .5946  -.0068  .6014  .6452  .0017 


shows  the  results  of  one  of  the  six  Monte  Carlo  studies  broken 
down  into  sample  sizes  that  correspond  to  the  sample  sizes  for 
the  18  MOS  in  the  Synthetic  Validity  Project.  The  results  are 
rank  ordered  by  group  size.  The  sms1 lest  sample  size  is  69  and 
the  largest  is  597.  The  entries  are  mean  values  based  upon  30 
realizations  each,  i.e.,  30  pseudo-random  samples  of  the  indicat¬ 
ed  size  drawn  from  the  population  corresponding  to  the  Overall 
Performance  criterion  with  no  discriminant  validity. 

As  expected,  the  foldback  correlations  decrease  and  true 
validities  increase  as  sample  size  increases.  Moreover,  the 
foldback  coefficients  for  the  weights  computed  using  RHOXX  are 
consistently  higher  than  the  foldback  coefficients  for  the 
weights  computed  using  the  30  independent  sample  correlation 
matrices,  RXX  (compare  columns  two  and  eight) .  The  magnitude  of 
this  difference  is  related  to  the  group  size. 

Somewhat  unexpectedly,  the  true  validities  for  the  weignts 
computed  using  RHOXX  are  slightly,  but  uniformly,  lower  than  the 
true  validities  for  the  weights  computed  using  the  30  independent 
RXX's  (compare  columns  five  and  eleven). 

The  entries  for  bias  for  the  Rozeboom  Equation  #8  estimator 
consist  of  the  differences  between  each  Rozeboom  estimate  of  the 
validity  and  the  corresponding  true  validity.  The  entries  for 
bias  for  the  Claudy  Equation  #12  estimator  consist  of  the  differ¬ 
ences  between  each  Claudy  estimator  and  the  population  multiple 
correlation. 

Clearly,  the  bias  of  both  the  Rozeboom  and  Claudy  estimators 
is  greater  for  the  smaller  group  sizes  when  the  weights  are 
computed  using  RHOXX.  It  is  reasonable  to  assume  that  the  small 
reversals  that  occur  in  the  downward  trend  in  the  bias  columns 
for  that  condition  can  be  attributed  to  sampling  error  since  all 
samples  were  drawn  from  the  same  population,  i.e.,  no  difference 
in  the  correlations  with  the  criterion  (RXY) . 

Not  surprisingly,  both  the  Rozeboom  and  Claudy  estimates 
seem  to  perform  quite  well.  The  bias  with  respect  to  their 
corresponding  parameters  is  very  small,  with  the  exception  of  the 
case  where  the  sample  size  is  only  69. 

Thus,  it  is  reasonable  to  conclude  that  the  phenomenon  which 
causes  bias  in  the  estimation  of  validity  when  using  population 
correlations  between  predictors  in  place  of  sample  correlations 
is  highly  related  to  sample  size. 

In  summary,  these  results  indicate  that  using  the  population 
correlation  matrix,  RHOXX,  in  the  computations  of  the  regression 
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weights  for  each  MOS  rather  than  the  MOS  sample  correlation 
matrix,  RXX,  is  probably  not  the  best  way  to  proceed.  The  result 
seems  to  be  that  the  true  validity  of  composites  computed  from 
such  weights  is  less  than  it  would  have  been  if.  the  individual 
MOS  sample  correlation  matrices  had  been  used.  Moreover,  the 
estimates  of  validity  seem  to  have  been  inflated  by  using  RHOXX. 

Since  this  phenomenon  runs  contrary  to  conventional  wisdom 
and,  indeed,  contrary  to  the  intuition  of  these  researchers,  we 
feel  that,  although  these  Monte  Carlo  results  are  convincing, 
they  are  limited  to  a  specific  set  of  conditions.  We  do  not,  for 
instance,  know  what  to  expect  if  the  numbers  of  predictors  or  the 
population  multiple  correlation  were  varied.  Thus,  extrapolation 
of  this  phenomenon  to  other  conditions  may  be  premature  and 
further  study  is  indicated. 


Comparison  of  Results  Using  RHOXX  and  RXX 


Because  of  the  findings  reported  above,  we  recomputed  all 
the  vali  tion  analyses.  In  the  reanalysis,  we  used  the  MOS 
sample  correlation  matrices,  RXX  corrected  for  range  restriction, 
in  place  of  RHOXX  which  had  also  been  corrected  for  range  re¬ 
striction.  These  are  the  results  reported  in  Chapter  4.  We  also 
prepared  some  tables  comparing  the  results  obtained  when  the  two 
different  matrices  (RHOXX  and  RXX)  were  used.  We  performed  these 
comparisons  primarily  to  determine  the  nature  and  size  of  the 
differences  in  analytic  results  when  the  two  different  types  of 
predictor  intercorrelation  matrices  were  utilized. 

To  repeat,  the  initial  set  of  analyses  used  RHOXX,  the 
single,  population  matrix  of  predictor  intercorrelations  in 
computing  least  squares  equations  and  validity  coefficients  for 
all  equations  (least  square  and  synthetic) .  The  second  set  of 
analyses  used  RXX,  the  sample  matr-'x  of  predictor  intercorrela¬ 
tions.  Thus,  there  was  a  separate  RXX  for  each  MOS  in  the  second 
set  of  analyses. 

Addendum  Tables  7  and  8  compare  results  for  the  computations 
of  least  squares  equations  and  calculating  the  foldback  correla¬ 
tions.  Addendum  Table  7  shows  the  foldback  correlations  for  the 
least  squares  equations  using  all  26  predictors  for  each  of  the 
eighteen  MOS,  computed  using  RHOXX  and  using  RXX,  for  both  Core 
Technical  Proficiency  and  Overall  Performance.  The  mean  values 
across  all  18  MOS  are  nearly  identical  for  Core  Technical  Profi¬ 
ciency  and  there  is  about  .01  difference  for  Overall  Performance . 
The  standard  deviations  of  the  coefficient  values,  computed 
across  the  18  MOS,  are  very  similar  for  the  two  methods  (and  for 


ADD-18 


Addendum  Table  7 


Foldback  Correlation  Coefficients  for  Least  Squares  Equations 
Using  26  Predictors,  Computed  Using  RHOXX  and  RXX 


MOS 

Core  Technical  Proficiency 

26  Predictors 

Overall  Performance 

26  Predictors 

RHOXX 

RXX 

RHOXX 

RXX 

11B 

.749 

.747 

.689 

.699 

12B 

.707 

.702 

.688 

.679 

13B 

.511 

.492 

.526 

.498 

16S 

.604 

.585 

.661 

.651 

19K 

.681 

.677 

.721 

.708 

27E 

.844 

.861 

.839 

.851 

31C 

.676 

.685 

.667 

.633 

51B 

.947 

.932 

.943 

.914 

54B 

.772 

.776 

.727 

.728 

55B 

.735 

.756 

.652 

.616 

63B 

.725 

.747 

.614 

.623 

67N 

.847 

.843 

.827 

.838 

71L 

.674 

.681 

.659 

.650 

76Y 

.683 

.686 

.622 

.613 

88M 

.638 

.640 

.624 

.601 

91A 

.781 

.784 

.692 

.680 

94B 

.772 

.771 

.641 

.635 

95B 

.693 

.676 

.753 

.733 

Mean 

.724 

.725 

.697 

.686 

S.D. 

.096 

.099 

.094 

.098 
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the  two  criteria)  . 


Addendum  Table  8  shows  the  same  data  when  only  the  three 
ASVAB  predictors  are  entered  into  the  least  squares  equations. 
Once  again,  the  mean  coefficients  are  very  similar  across  the  two 
methods  of  computation.  The  standard  deviation  of  the  coeffi¬ 
cients  is  slightly  higher,  about  .01  for  Core  Technical  and  .02 
for  Overall,  for  the  coefficients  computed  using  RXX. 

The  results  shown  in  Addendum  Tables  7  and  8  indicate  that 
the  use  of  RHOXX  versus  the  use  of  RXX  made  little  difference  in 
the  observed  results  for  least  squares  equations  computed  for 
each  MOS.  However,  for  the  sake  of  completeness,  the  least 
squares  equations  developed  using  RHOXX  are  shown  in  the  appendix 
to  this  addendum.  These  equations  can  be  compared  to  their 
counterparts  in  Appendix  E  to  the  main  body  of  the  report,  if  the 
reader  desires. 

Addendum  Table  9  shows  the  absolute  and  discriminant  validi¬ 
ties  for  predicting  Core  Technical  Proficiency  computed  using  RXX 
and  RHOXX,  for  the  various  methods  of  forming  prediction  equa¬ 
tions  via  synthetic  techniques.  Although  predictor  intercorrela¬ 
tions  are  not  used  in  developing  the  synthetic  equations  (no 
sample  predictor  or  criterion  data  are  used  to  develop  any  syn¬ 
thetic  equation) ,  they  are  used  in  computing  the  validity  coeffi¬ 
cient  for  the  equations.  Therefore,  we  computed  the  absolute  and 
discriminant  validities  using  RHOXX  and  using  RXX.  Note  that 
Addendum  Table  9  also  shows  the  absolute  and  discriminant  validi¬ 
ties  for  the  least  squares  equations  computed  both  ways.  (We  do 
not  here  describe  the  various  types  of  synthetic  equations,  in 
the  interests  of  conserving  space.  The  interested  reader  may 
refer  to  Chapter  4  of  the  main  report  to  find  descriptions.) 

The  results  in  Addendum  Table  9  are  very  similar  across  the 
two  methods  (RHOXX  versus  RXX) .  No  difference  is  greater 
than  .01  for  either  absolute  or  discriminant  validity. 

Addendum  Table  10  shows  the  absolute  and  discriminant  valid¬ 
ities  for  predicting  Overall  Performance  when  using  RHOXX  and 
when  using  RXX.  Once  again,  there  is  very  little  difference 
between  the  two  methods  for  the  synthetically  formed  equations. 
The  largest  difference  is  .01  for  either  absolute  or  discriminant 
validity.  However,  the  least  squares  equation  using  all  26 
predictors  shows  a  .02  reduction  in  absolute  validity  (.63 
vs  .61)  and  a  .03  reduction  in  discriminant  validity  (.04  vs  .01) 
when  the  sample  predictor  correlations  (RXX)  are  used. 

For  the  most  part,  then,  it  seemed  to  make  little  difference 
whether  RHOXX  or  RXX  was  used  to  form  least  squares  equations  or 
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Addendum  Table  8 


Foldback  Correlation  Coefficients  for  Least  Squares  Equations 
Using  Three  ASVAB  Predictors,  Computed  Using  RHOXX  and  RXX 


MOS 

Core  Technical  Proficiency 

3  ASVAB  Predictors 

Overall  Performance 

3  ASVAB  Predictors 

RHOXX 

RXX 

RHOXX 

RXX 

11B 

.694 

.680 

.592 

.580 

12B 

.675 

.671 

.593 

.581 

13B 

.409 

.388 

.402 

.355 

16S 

.534 

.508 

.532 

.514 

19K 

.627 

.627 

.614 

.602 

27E 

.731 

.787 

.701 

.760 

31C 

.625 

.635 

.596 

.550 

51B 

.797 

.837 

.658 

.741 

54B 

.705 

.727 

.639 

.652 

55B 

.667 

.713 

.514 

.497 

63B 

.654 

.680 

.488 

.489 

67N 

.811 

.811  ■ 

.780 

.797 

71L 

.591 

.599 

.546 

.556 

76Y 

.645 

.646 

.548 

.543 

88M 

.595 

.595 

.542 

.516 

91A 

.720 

.726 

.586 

.573 

94B 

.704 

.695 

.532 

.514 

95B 

.639 

.627 

.669 

.656 

Mean 

.657 

.664 

.585 

.582 

S.D. 

.090 

.103 

.084 

.104 
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Addendum  Table  9 


Absolute  and  Discriminant  Validities  Computed  Using  RHOXX  and  RXX 
by  Synthetic  Validity  and  Least  Squares  Methods:  Core  Technical 
Proficiency 


Using  RHOXX 

Absolute  Discriminant 

Using  RXX 

Absolute  Discriminant 

MOS  Mean  Importance 

X  Validity 

.55 

.00 

.55 

.00 

0-1  Attribute  Weights 

.63 

.01 

.63 

.01 

0-X  Attribute  Weights 

.63 

.01 

.64 

.01 

ASVAB  Reduction 

.64 

.00 

.64 

.00 

MOS  Threshold  Importance 

X  Validity 

.56 

.01 

.56 

.01 

0-1  Attribute  Weights 

.61 

.02 

.62 

.02 

0-X  Attribute  Weights 

.62 

.02 

.62 

.02 

ASVAB  Reduction 

.64 

.00 

.65 

.00 

Cluster  Mean  Importance 

X  Validity 

.55 

.00 

.55 

.00 

0-1  Attribute  Weights 

.63 

.00 

.63 

.00 

0-X  Attribute  Weights 

.63 

.00 

.64 

.00 

ASVAB  Reduction 

.63 

.00 

.64 

.00 

Cluster  Threshold  Importance 

X  Validity 

.57 

.01 

.57 

.01 

0-1  Attribute  Weights 

.63 

.01 

.63 

.01 

0-X  Attribute  Weights 

.63 

.01 

.64 

.01 

ASVAB  Reduction 

.64 

.00 

.64 

.00 

Least  Squares,  All  26  Predic- 

tors,  Rozeboom  Corrected 

.67 

.07 

.67 

.06 

Least  Squares,  ASVAB  Predirtors 

Only,  Rozeboom  Corrected 

.65 

.03 

.66 

.03 

ADD-22 


Addendum  Table  10 


Absolute  and  Discriminant  Validities  Computed  Using  RHOXX  and  RXX 
by  Synthetic  Validity  and  Least  Square  Methods:  Overall  Perform¬ 
ance 


Using  RHOXX 

Absolute  Discriminant 

Using  RXX 

Absolute  Discriminant 

MOS  Mean  Importance 

X  Validity 

.57 

.00 

.56 

.00 

0-1  Attribute  Weights 

.59 

.00 

.59 

.00 

0-X  Attribute  Weights 

.59 

.00 

.59 

.00 

ASVAB  Reduction 

.57 

.00 

.57 

.00 

MOS  Threshold  Importance 

X  Validity 

.57 

.00 

.56 

.00 

0-1  Attribute  Weights 

.57 

.01 

.57 

.01 

0-X  Attribute  Weights 

.57 

.01 

.57 

.01 

ASVAB  Reduction 

.57 

.00 

.57 

.00 

Cluster  Mean  Importance 

X  Validity 

.57 

.00 

.56 

-.01 

0-1  Attribute  Weights 

.59 

.00 

.59 

.00 

0-X  Attribute  Weights 

.59 

.00 

.59 

.00 

ASVAB  Reduction 

.57 

.00 

.57 

.00 

Cluster  Threshold  Importance 

X  Validity 

.57 

.00 

.56 

.00 

0-1  Attribute  Weights 

.56 

.00 

.56 

.00 

0-X  Attribute  Weights 

.57 

.00 

.56 

.00 

ASVAB  Reduction 

.57 

.00 

.57 

.00 

Least  Squares,  All  26  Predic- 

tors,  Rozeboom  Corrected 

.63 

.04 

.61 

.01 

Least  Squares,  ASVAB  Predictors 

Only,  Rozeboom  Corrected 

.57 

.01 

.57 

.01 
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to  compute  validity  coefficients.  The  one  case  where  a  notice¬ 
able  difference  appeared  was  for  predicting  Overall  Performance 
using  all  26  predictors  in  a  least  squares  equation.  In  that 
case,  the  results  using  RXX  showed  a  lower  value  for  the  absolute 
validity  which  is  the  expected  result  given  the  findings  of  the 
simulations  reported  earlier  in  this  addendum.  For  completeness, 
the  appendix  to  this  addendum  includes  tables  showing  validity 
coefficients  computed  using  RHOXX  for  all  synthetic  and  least 
squares  methods.  The  interested  reader  can  compare  these  to  the 
tables  in  Appendix  F  to  the  main  body  of  the  report. 

We  also  compared  the  RHOXX  versus  the  RXX  results  obtained 
in  the  investigations  of  the  synthetic  and  validity  generaliza¬ 
tion  models.  Recall  that  we  developed  three  types  of  "validity 
generalization"  equations  on  the  nine  Batch  A  MOS  and  applied 
them  to  the  nine  Batch  Z  MOS  (See  Chapter  4,  Section  entitled 
"Comparison  of  Synthetic  Validation  Model  to  Validity  Generaliza¬ 
tion  Model") .  These  were  labeled  the  Batch  A  "MOS-Match"  Least 
Squares,  the  Batch  A  Cluster  Least  Squares,  and  the  Batch  A 
General  Least  Squares  methods.  In  addition,  we  computed  least 
squares  equations  for  each  of  the  Batch  Z  MOS  using  their  sample 
data.  These  equations  were  called  "Own  Least  Squares."  Finally, 
we  applied  the  various  synthetic  methods  to  the  Batch  Z  MOS. 
Addendum  Tables  11  through  14  present  data  comparing  these  meth¬ 
ods  when  RHOXX  is  used  and  when  RXX  is  used  to  develop  equations 
(for  least  squares  methods)  and  to  compute  validity  coefficients 
(for  both  least  squares  and  synthetic  methods) . 

Addendum  Table  11  shows  the  results  when  least  squares 
equations  developed  on  Batch  A  MOS  are  applied  to  Batch  Z  MOS, 
and  RHOXX  is  used.  Addendum  Table  12  shows  the  same  results  when 
RXX  is  used.  Note  that  all  these  coefficients  are  cross¬ 
validities  and  require  no  adjustments.  The  first  comparison  of 
note  is  the  "transported”  mean  validity  coefficient,  i.e.,  the 
mean  value  of  the  coefficients  obtained  when  the  equation  for  a 
Batch  A  MOS  is  applied  to  the  Batch  Z  MOS  that  it  "matches."  (In 
this  case,  match  means  that  the  Army  Task  Questionnaire  profile 
for  a  Batch  Z  MOS  correlated  highest  with  that  Batch  A  MOS  pro¬ 
file)  .  These  coefficients  are  .64  when  RHOXX  is  used  and  .67 
when  RXX  is  used  (see  footnotes  in  the  two  tables) .  Thus,  we 
obtain  higher  transported  validity  when  we  use  RXX. 

A  second  noteworthy  point  concerns  the  variance  in  coeffi¬ 
cients.  When  RXX  is  used,  there  is  higher  variance  (compare  the 
row  standard  deviations  and  the  footnote  standard  deviations) . 

This  is  understandable  since  a  different  RXX  is  used  for  each  MOS 
in  computing  the  validity  of  the  least  squares  equations,  whereas 
RHOXX  is  the  same  for  each  MOS. 
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Addendum  Table  11 


Validity  Coefficients  of  Least  Squares  Equations  Using  REOXX  for 
Predicting  Core  Technical  Proficiency,  When  Developed  on  Project  A 
Batch  A  HOS  and  Applied  to  Project  A  Batch  Z  MOS:  Highest  Column 
Entries  Underlined 


Applied  to  Batch  Z  MOS 


Equation 

from 

Batch  A  MOS 

12B 

16S 

27E 

51B 

54B 

55B 

67N 

76Y 

94B 

Mean 

S.D. 

11B 

.64* 

.54* 

.68 

.81 

.69 

.63 

.76 

.58 

.63 

.66 

.08 

13B 

.57 

.49 

.62 

.77 

.62* 

.59 

.68 

.47 

.51 

.59 

.09 

19K 

.64 

.52 

.68 

.82 

.68 

.60 

.75 

.58 

.62 

.65 

.09 

31C 

.61 

.51 

.62* 

.65 

.68 

.60 

.70 

.62 

.64 

.63 

.05 

63B 

.60 

.38 

.57 

.66 

.59 

.62 

.70* 

.45 

.45 

.56 

.10 

71L 

.52 

.55 

.54 

.68 

.65 

.46 

.59 

.60* 

.71 

.59 

.08 

88M 

.66 

.49 

.60 

.77* 

.67 

.63* 

.76 

.56 

.62* 

.64 

.08 

91A 

.63 

.53 

.70 

.80 

.71 

.63 

.76 

.59 

.64 

.67 

.08 

95B 

.62 

.54 

.67 

.84 

.71 

.60 

.74 

.60 

.69 

.67 

.08 

Mean 

.61 

.51 

.63 

.76 

.67 

.60 

.72 

.56 

.61 

S.D. 

.04 

.05 

.05 

.07 

.04 

.05 

.05 

.06 

.08 

•Validity  coefficient  for  Batch  Z  MOS  using  the  equation  developed  on  Batch  A 
MOS  that  is  most  similar  in  terms  of  ATQ  Profile  correlation;  Mean  **  .64, 

S .D .  -  . 06 . 
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Addendum  Table  12 


Validity  Coefficients  of  Least  Squares  Equations  Using  RXX  for 
Predicting  Core  Technical  Proficiency,  When  Developed  on  Project  A 
Batch  A  MOS  and  Applied  to  Project  A  Batch  Z  HOS:  Highest  Column 
Entries  Underlined 


Applied  to  Batch  Z  MOS 


Equation 

from 


Batch  A  MOS 

12B 

16S 

27E 

51B 

54B 

55B 

67N 

7  6Y 

94B 

Mean 

S.D. 

11B 

.64* 

.50* 

.70 

.88 

.71 

.67 

.77 

.56 

.65 

.68 

.10 

13B 

.62 

.50 

.65 

.97 

.70* 

.70 

.78 

.43 

.59 

.66 

.15 

19K 

.63 

.50 

.70 

.83 

.72 

.58 

.83 

.56 

.65 

.67 

.11 

31C 

.59 

.46 

■  74* 

.86 

■  72 

.66 

.74 

.64 

.66 

.67 

.11 

63B 

.55 

.31 

.61 

.80 

.62 

.62 

.76* 

.38 

.47 

.57 

.15 

71L 

.45 

.50 

.54 

.76 

.63 

.47 

.59 

.59* 

.70 

.58 

.10 

88M 

.64 

.45 

.65 

.84* 

.72 

.62* 

.80 

.55 

.63* 

.66 

.11 

91A 

.59 

.48 

.67 

.89 

.72 

.64 

.84 

.54 

.64 

.67 

.13 

95B 

.60 

.53 

.66 

.82 

.72 

.57 

.75 

.61 

.68 

.66 

.09 

Mean 

.59 

.47 

.66 

.85 

.70 

.  61 

.76 

.54 

.63 

S.D. 

.06 

.06 

.05 

.06 

.04 

.06 

.07 

.08 

.06 

♦Validity  coefficient 

for 

Batch 

Z  MOS 

using 

the  equation 

developed  on 

Batch 

A 

MOS  that  is  most  similar  in  terms  of  ATQ  Profile  correlation;  Mean  =*  .57,  S.D. 
-  .10. 
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Addendum  Table  13  shows  the  validity  coefficients  for  the 
four  cluster  equations  and  the  general  equation  when  applied  to 
the  Batch  Z  MOS,  using  RHOXX  and  using  RXX.  The  underlined  coeffi¬ 
cients  indicate  the  coefficients  for  the  appropriate  cluster  equa¬ 
tion  for  each  MOS,  that  is,  the  cluster  that  the  Batch  Z  MOS  most 
closely  matches  in  terms  of  its  Army  Task  Questionnaire  profile. 

The  means  and  standard  deviations  for  the  columns  show  that 
the  general  equation  has  identical  results  across  RHOXX  and  RXX, 
which  is  not  unexpected  since  the  general  equation  averages  across 
all  Batch  A  MOS,  and  the  results  should  be  very  similar  to  those 
using  RHOXX.  The  four  cluster  equations  show  slightly  higher  mean 
validities  and  standard  deviations  where  RXX  is  used  (.01  to  .03 
higher).  The  mean  "transported"  validity  coefficient,  i.e.,  the 
mean  of  the  appropriate  cluster  validities,  is  also  slightly 
higher  (.02)  for  RXX. 

Addendum  Table  14  summarizes  the  absolute  and  discriminant 
validity  results  for  the  Batch  Z  MOS  when  all  the  methods  are 
applied  using  RHOXX  and  using  RXX.  The  absolute  validity  for  the 
"Own"  least  squares  (i.e.,  the  least  squares  equation  developed  on 
and  applied  to  each  Batch  Z  MOS)  is  the  same,  but  the  discriminant 
validity  decreased  from  .08  to  .05  for  RXX.  This  means  that  the 
off-diagonal  validity  coefficients  were  higher,  on  average,  for  the 
RXX  method. 

The  results  for  the  three  validity  generalization  methods 
are  interesting.  The  general  equation  shows  the  same  results 
when  RHOXX  and  RXX  are  used,  and  we  have  already  commented  on 
that.  However,  the  "MOS-Match"  shows  an  increase  of  .03  in 
absolute  validity  and  an  increase  of  .02  in  discriminant  validity 
when  RXX  is  used.  The  cluster  method  also  shows  an  increase  in 
absolute  validity  (.02),  but  a  decline  in  discriminant  validity. 
These  results,  though  intriguing,  should  not  be  given  undue 
weight  since  the  differences  are  small.  Still,  the  "MOS-Match" 
method  does  look  better  as  a  method  for  transporting  validity 
when  RXX  is  used  compared  to  when  RHOXX  is  used. 

The  synthetic  equation  results  are  within  .01  across  the  two 
methods  (RXX  versus  RHOXX) ,  except  for  the  discriminant  validity 
for  the  "Top  5  Stepwise  Reduction  Method"  (a  .02  decrease  for 
RXX)  and  the  absolute  validity  for  the  0-1  or  0-mean  Attribute 
Weights  combined  with  Threshold  Component  Weights  (a  .02  increase 
for  RXX) .  Again,  undue  weight  should  probably  not  be  placed  on 
these  small  differences. 

In  general,  the  use  of  RHOXX  versus  RXX  does  not  seem  to 
cause  any  major  difference  in  the  analytic  results  for  the  data 
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Addendum  Table  13 


Validity  Coefficients  of  General  and  Cluster  Least  Squares  Equa¬ 
tions  for  Predicting  Core  Technical  Proficiency,  Developed  on 
Batch  A  HOS  and  Applied  to  Batch  Z  MOS, 1  Using  RHOXX  and  Using  RXX 


Batch  Z 

Validity  Coefficients 

For: 

General 

Equation 

Mechanical 

Equation 

Administrative 

Equation 

Combat 

Equation 

Electronics 

Equation 

MOS 

RHOXX 

RXX 

RHOXX 

RXX 

RHOXX 

RXX 

RHOXX  RXX 

RHOXX 

RXX 

12B 

.66 

.65 

.64 

.64 

.61 

.60 

.65 

.64 

.61 

.61 

16S 

.55 

.51 

.45 

.41 

.57 

.54 

.55 

.52 

.51 

.48 

27E 

.68 

.74 

.60 

.66 

.65 

.71 

.69 

.73 

.62 

.71 

51B 

.82 

.87 

.73 

.79 

.78 

.86 

.85 

.88 

.66 

.78 

54B 

.72 

.74 

.64 

.68 

.72 

.72 

.71 

.73 

.68 

.70 

55B 

.65 

.69 

.64 

.67 

.57 

.64 

.63 

.69 

.60 

.64 

67N 

.78 

.78 

.75 

.77 

.71 

.73 

.77 

.77 

.70 

.70 

76Y 

.60 

.61 

.51 

.52 

.63 

.63 

.59 

.59 

.62 

.62 

94B 

.66 

.68 

.55 

.57 

.71 

.72 

.65 

.67 

.64 

.65 

Mean 

.68 

.68 

.61 

.63 

.66 

.68 

.68 

.69 

.63 

.65 

S.D. 

.08 

.08 

.09 

.11 

.07 

.09 

.09 

.10 

.05 

.08 

RHOXX 

RXX 

Mean 

of  appropriate  cluster 

coefficients 

(underlined) 

-  .66 

.68 

S.D. 

of  appropriate  cluster 

coefficients 

(underlined)  1 

-  .07 

.08 

Note.  The  predictor  intercorrelations  and  the  correlations  of  the  26 

attributes  with  Core  Technical  Proficiency  for  the  four  clusters  (M,  A, 
C,  E)  were  estimated  by  the  pooled  correlations  of  the  Batch  A  MOS  in 
each  cluster  and,  for  the  General  group,  by  pooling  all  of  the  Batch  A 
correlations . 

1  Underlined  coefficients  indicate  the  appropriate  cluster  for  each  MOS. 
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Addendum  Table  14 


Absolute  and  Discriminant  Validity  Coefficients  for  Predicting 
Core  Technical  Proficiency  (Computed  Across  Mine  Batch  Z  MOS) 
for  Equations  Developed  from  Various  Methods ,  Using  RHOXX  and  RXX 


Absolute 

Validity 

Discriminant 

Validity 

Equation 

RHOXX 

RXX 

RHOXX 

RXX 

"Own"  Least  Squares 

•  701 

.70 

.08 

.05 

Batch  A  "MOS-Match"  Least  Squares 

.64 

.67 

.01 

.03 

Batch  A  Cluster  Least  Squares 

.66 

.68 

.02 

.01 

Batch  A  General  Least  Squares 

.68 

.68 

.00 

.00 

Full  Synthetic  (Mean  Attribute 
Validities  and  MOS  Mean  Com¬ 
ponent  Weights) 

.56 

.56 

.00 

-.01 

Top  5  Stepwise  Reduction 

.57 

.58 

.00 

-.02 

0-1  Attribute  Weights 

.65 

.66 

.00 

.00 

O-Mean  Attribute  Weights 

.66 

.66 

.00 

.00 

Threshold  Component  Weights 

.59 

.58 

.01 

.00 

0-1  Attribute  Weights  and 

Threshold  Component  Weights 

.63 

.65 

.02 

.01 

0-Mean  Attribute  Weights  and 
Threshold  Component  Weights 

.63 

.65 

.02 

.02 

1  The  absolute  validities  for  "own"  least  squares  equations  were  comput¬ 
ed  on  coefficients  adjusted  with  Rozeboom's  Equation  #8  (1978). 

Other  absolute  validities  were  computed  on  coefficients  that  did  not 
require  adjustments. 
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observed  in  this  project.  The  simulation  results  did  indicate 
that  use  of  RHOXX  would  lead  to  a  positive  bias  when  least 
squares  methods  were  used,  but  this  result  was  only  partially 
seen  in  these  data. 
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ADDENDUM  APPENDIX 


Normalized  Attribute  Weights  for  Least  Squares  Equations 
and  Validities  of  Synthetically  Formed  Prediction 
Equations  for  18  MOS  by  Different  Criterion 
Measures  and  by  Different  Weighting  Methods/ 

Using  RHOXX  Rather  Than  RXX 
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Equations  for  18  MOS:  Overall  Performance,  0-Mean 
Attribute  Weights  &  Cluster  Threshold  Component 
Weights 

Addendum  Appendix  Table  44  Validities  of  Synthetically  Formed  Prediction 

Equations  for  18  MOS:  Overall  Performance,  Mean 
Attribute  Validities  &  Cluster  Threshold  Component 
Weights,  ASVAB  Reduction 

Addendum  Appendix  Table  45  Validities  of  Least  Squares  Prediction  Equations 

for  18  MOS:  Core  Technical  Proficiency,  No  Reduc¬ 
tion 

Addendum  Appendix  Table  46  Validities  of  Least  Squares  Prediction  Equations 

for  18  MOS:  Core  Technical  Proficiency,  ASVAB 
Reduction 

Addendum  Appendix  Table  47  Validities  of  Least  Squares  Prediction  Equations 

for  18  MOS:  Overall  Performance,  No  Reduction 

Addendum  Appendix  Table  48  Validities  of  Least  Squares  Prediction  Equations 

for  18  MOS:  Overall  Performance,  ASVAB  Reduction 


ADD-36 


Addendum  Appendix  Table  1 

Least  Squares  Beta  Weights  for  18  MOS:  Core  Technical  Proficiency,  No  Reduction 


ADD- 3 7 


Addendum  Appendix  Table  2 

Least  Squares  Beta  Weights  for  18  MOS:  Core  Technical  Proficien- 
cy,  ASVAB  Reduction 


Attribute 

MOS 

Verb* 

Numb^ 

Mech  ^ 

11B 

.23 

0.23 

0.31 

12B 

.19 

0.11 

0.44 

13B 

.03 

0.17 

0.25 

16S 

.18 

0.37 

0.01 

19K 

.11 

0.23 

0.36 

27E 

.40 

0.26 

0.14 

31C 

.16 

0.35 

0.18 

51B 

.31 

0.28 

0.30 

54B 

.30 

0.27 

0.21 

55B 

.31 

-0.03 

0.43 

63B 

-.13 

0.06 

0.71 

67N 

.20 

0.11 

0.57 

71L 

.22 

0.48 

-0.12 

76Y 

.13 

0.47 

0.10 

88M 

-.02 

0.19 

0.47 

91A 

.35 

0.19 

0.26 

94B 

.16 

0.51 

0.09 

95B 

.24 

0.28 

0.19 

i  Verb  =  Project  A  measure  A1AVERBL. 

~  Numb  =  Project  A  measures  A1AQUANT  +  B3CCNMSH. 
J  Mech  =  Project  A  measure  A1ATECH. 
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Addendun  Appendix  Table  3 

Least  S<juares  Beta  Weights  for  18  MOS:  Overall  Performance,  Mo  Reduction 
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Addendum  Appendix  Table  4 


Least 

ASVAB 

Squares  Beta  Heights 
Reduction 

for  18  MOS: 

Overall  Job  Performance/ 

Attribute 

MOS 

Verb'*- 

Numb^ 

Mech  ^ 

11B 

0.11 

0.29 

0.25 

12B 

0.15 

0.24 

0.28 

13B 

-0.05 

0.28 

0.21 

16S 

0.21 

0.33 

0.03 

19K 

0.09 

0.32 

0.27 

27E 

0.42 

0.16 

0.18 

31C 

0.04 

0.35 

0.27 

51B 

0.13 

0.34 

0.26 

54B 

0.20 

0.34 

0.18 

55B 

0.07 

0.09 

0.40 

63B 

-0.02 

0.11 

0.43 

67N 

0.26 

0.13 

0.47 

71L 

0.12 

0.33 

0.15 

76Y 

0.10 

0.34 

0.17 

88M 

-0.09 

0.25 

0.43 

91A 

0.15 

0.19 

0.31 

94B 

0.02 

0.51 

0.00 

95B 

0.12 

0.29 

0.34 

i  Verb 
i  Numb 
J  Mech 

=  Project  A 
=  Project  A 
=  Project  A 

measure  A1AVERBL. 
measures  A1AQUANT  + 
measure  A1ATECH. 

B3CCNMSH. 

ADD-40 
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